728x90
반응형
y=DGEList(counts=matrix)
y=calcNormFactors(y)
tmm_mat=cpm(y,normalized.lib.sizes=TRUE)
https://support.bioconductor.org/p/77193/
I don't think it's clear what you are asking for. Let's assume that y
is your DGEList
with your count data, which you already called calcNormFactors
on.
Are you after the TMM normalization factors? These are stored in your y$samples$norm.factors
column.
Do you just want a gene expression matrix from your data, normalized by a "simple" per-million factor? Call cpm(y, normalized.lib.sizes=FALSE)
But you probably don't want that.
If you're after gene expression normalized by sequencing depth (adjusted by TMM factors), just call cpm(y)
as Aaron has already suggested.
edgeR
Bioconductor package (version 2.4.0). It is also based on the hypothesis that most genes are not DE. The TMM factor is computed for each lane, with one lane being considered as a reference sample and the others as test samples. For each test sample, TMM is computed as the weighted mean of log ratios between this test and the reference, after exclusion of the most expressed genes and the genes with the largest log ratios. According to the hypothesis of low DE, this TMM should be close to 1. If it is not, its value provides an estimate of the correction factor that must be applied to the library sizes (and not the raw counts) in order to fulfill the hypothesis. ThecalcNormFactors()
function in theedgeR
Bioconductor package provides these scaling factors. To obtain normalized read counts, these normalization factors are re-scaled by the mean of the normalized library sizes. Normalized read counts are obtained by dividing raw read counts by these re-scaled normalization factors.https://academic.oup.com/bib/article/14/6/671/189645/A-comprehensive-evaluation-of-normalization
The
calcNormFactors
function normalizes for RNA composition by finding a set of scaling
factors for the library sizes that minimize the log-fold changes between the samples for most
genes. The default method for computing these scale factors uses a trimmed mean of M-
values (TMM) between each pair of samples [26]. We call the product of the original library
size and the scaling factor the
effective library size
. The effective library size replaces the
original library size in all downsteam analyses.
728x90
반응형
'Bioinformatics(생정보학)' 카테고리의 다른 글
Consensus clustering (0) | 2017.06.14 |
---|---|
integrative modeling of multi-omics data to identify cancer drivers and infer patient-specific gene activity (0) | 2017.05.26 |
htseq-count (0) | 2017.05.23 |
GENCODE (0) | 2017.05.23 |
tophat2 설치 (0) | 2017.05.12 |