parallel multicore function이용법

728x90

library(foreach);library(doParallel)

# 필요한 CPU 설정

cl <- makeCluster(2)

# cluster 등록

registerDoParallel(cl)

res=foreach(i=1:1000, .combine=rbind) %dopar% {
        # Filling the data
        w2=grep(tmp$pos,pattern = mu2$pos[j]) # Get row number of the vcf file
        # Tumor satus Genotype:read_depth
        va1=paste(unlist(strsplit(tmp[w2,10],":"))[1:2],collapse = ":")

        # Normal status Genotype:read_depth
        va2=paste(unlist(strsplit(tmp[w2,11],":"))[1:2],collapse = ":")

        c(va1,va2)
        }

stopCluster(cl)

.combine : 각 결과를 table형태로 반환할 때 row bind로 할지 cbind로 할지 결정함.

%dopar% : parallel processing 용

결과물은 c(va1,va2)과 같이 해두면 matrix가 형성됨 2 column이 될 것임.

stopCluster(cl) : 등록된 cluster job을 끝냄.

문제는 실시간 업데이트를 할 수 없음 속도는 빠르지만 기존에 있는 table에 넣을 수가 없음.

따라서 정리를 한 후에 for loop로 table에 집어넣는 것이 좋을 것으로 보임.

적은량의 계산량의 경우 for loop가 훨씬 나음. 다만 계산량이 많아지면 multicore의 이점이 커짐

적절한 것을 딱 정하긴 어렵지만 iteration을 1000번 이상하는 것의 경우 multicore가 나은 것으로 추정됨.

728x90

통통세알

parallel multicore function이용법

티스토리툴바