Critical limitations of consensus clustering in class discovery, Scientific reports

https://www.nature.com/articles/srep06207

PAC 계산 관련 R code

https://www.biostars.org/p/198789/

######################################################## 
seed=11111
d = matrix(rnorm(200000,0,1),ncol=200) # 200 samples in columns, 1000 genes in rows
colnames(d) = paste("Samp",1:200,sep="")
rownames(d) = paste("Gene",1:1000,sep="")
d = sweep(d,1, apply(d,1,median,na.rm=T))
maxK = 6 # maximum number of clusters to try
results = ConsensusClusterPlus(d,maxK=maxK,reps=50,pItem=0.8,pFeature=1,title="test_run",
innerLinkage="complete",seed=seed,plot="pdf")

# Note that we implement consensus clustering with innerLinkage="complete". 
# We advise against using innerLinkage="average" which is the default value in this package as average linkage is not robust to outliers.

############## PAC implementation ##############
Kvec = 2:maxK
x1 = 0.1; x2 = 0.9 # threshold defining the intermediate sub-interval
PAC = rep(NA,length(Kvec)) 
names(PAC) = paste("K=",Kvec,sep="") # from 2 to maxK
for(i in Kvec){
  M = results[[i]]$consensusMatrix
  Fn = ecdf(M[lower.tri(M)])
  PAC[i-1] = Fn(x2) - Fn(x1)
}#end for i
# The optimal K
optK = Kvec[which.min(PAC)]
########################################################

728x90

'Bioinformatics(생정보학)' 카테고리의 다른 글

vep 특정 genome 및 gtf파일 사용하기 (0)	2017.09.27
ensembl archive (0)	2017.06.20
integrative modeling of multi-omics data to identify cancer drivers and infer patient-specific gene activity (0)	2017.05.26
TMM normalization (0)	2017.05.24
htseq-count (0)	2017.05.23

통통세알

Consensus clustering

Critical limitations of consensus clustering in class discovery, Scientific reports

'Bioinformatics(생정보학)' 카테고리의 다른 글

티스토리툴바

Consensus clustering

Critical limitations of consensus clustering in class discovery, Scientific reports

'Bioinformatics(생정보학)' 카테고리의 다른 글

'Bioinformatics(생정보학)' Related Articles

티스토리툴바