TMM normalization

728x90

y=DGEList(counts=matrix)

y=calcNormFactors(y)

tmm_mat=cpm(y,normalized.lib.sizes=TRUE)

https://support.bioconductor.org/p/77193/

I don't think it's clear what you are asking for. Let's assume that y is your DGEList with your count data, which you already called calcNormFactors on.

Are you after the TMM normalization factors? These are stored in your y$samples$norm.factors column.

Do you just want a gene expression matrix from your data, normalized by a "simple" per-million factor? Call cpm(y, normalized.lib.sizes=FALSE)

But you probably don't want that.

If you're after gene expression normalized by sequencing depth (adjusted by TMM factors), just call cpm(y) as Aaron has already suggested.

edgeR

Bioconductor package (version 2.4.0). It is also based on the hypothesis that most genes are not DE. The TMM factor is computed for each lane, with one lane being considered as a reference sample and the others as test samples. For each test sample, TMM is computed as the weighted mean of log ratios between this test and the reference, after exclusion of the most expressed genes and the genes with the largest log ratios. According to the hypothesis of low DE, this TMM should be close to 1. If it is not, its value provides an estimate of the correction factor that must be applied to the library sizes (and not the raw counts) in order to fulfill the hypothesis. The

calcNormFactors()

function in the

edgeR

Bioconductor package provides these scaling factors. To obtain normalized read counts, these normalization factors are re-scaled by the mean of the normalized library sizes. Normalized read counts are obtained by dividing raw read counts by these re-scaled normalization factors.

https://academic.oup.com/bib/article/14/6/671/189645/A-comprehensive-evaluation-of-normalization

The

calcNormFactors

function normalizes for RNA composition by finding a set of scaling

factors for the library sizes that minimize the log-fold changes between the samples for most

genes. The default method for computing these scale factors uses a trimmed mean of M-

values (TMM) between each pair of samples [26]. We call the product of the original library

size and the scaling factor the

effective library size

. The effective library size replaces the

original library size in all downsteam analyses.

728x90

'Bioinformatics(생정보학)' 카테고리의 다른 글

Consensus clustering (0)	2017.06.14
integrative modeling of multi-omics data to identify cancer drivers and infer patient-specific gene activity (0)	2017.05.26
htseq-count (0)	2017.05.23
GENCODE (0)	2017.05.23
tophat2 설치 (0)	2017.05.12

통통세알

TMM normalization

'Bioinformatics(생정보학)' 카테고리의 다른 글

티스토리툴바

TMM normalization

'Bioinformatics(생정보학)' 카테고리의 다른 글

'Bioinformatics(생정보학)' Related Articles

티스토리툴바