Scanning Kolmogorov-Smirnov statistic
A modification to a Kolmogorov-Smirnov statistic, which we term a scanning ranked KS test, was used to determine which mutations alone or co-occurring combinations of mutation combinations can best predict a selective sensitivity to each unknown compound. In addition to single mutations, we also annotated a ‘RAS_Class’ metaclass in which we assigned a cell line a value of ‘1’ if it contained a mutation in either KRAS, NRAS, HRAS, PIK3C1, or BRAF. Mutations or pairwise combinations of co-occuring mutations were first binarized (1=mutated; 0=wild-type), resulting in 446,435 combinations in which at least 5 cell lines contained the mutation combination. For each chemical, we reasoned that if a mutation combination is conferring a selective sensitivity, then the ED50 or AUC values for cell lines that are mutated will be lower than those that are wild-type. To determine the degree to which the ED50/AUC values for cells that are mutated are located towards the bottom of the ranked list sensitivity values, and thus lower than the background distribution, the following equation was used:
where v(j) is the position of each gene in the gene set in the ordered list of genes, t is the total number of cell lines with the mutation combination, and n is the total number of cell lines assayed (n=100).
To determine a p-value, 5000 permutations of randomized sorting of ED50/AUC values of size t was performed, and urandom was calculated. The resulting p-value was determined to be:
p<.002 indicates that, out of 5000 permutations, no random value was less than the calculated distance, u. This process was repeated for each of the mutation combinations for each chemical using both AUC and ED50 values as a sensitivity metric. Our procedure is superior to a standard KS test in several ways. First, when comparing a large distribution to a small distribution in a regular KS test, the NULL hypothesis is biased towards being rejected. Second, a ranked KS test allows for the preferential ranking of sets that are separated from the background at the tails of the distribution.
'R관련 > Rfunction' 카테고리의 다른 글
survival analysis function (0) | 2019.03.21 |
---|---|
GSEA Enrichment Score calculation (0) | 2019.02.27 |
domain_annotation (0) | 2017.07.21 |
IC50, drc (0) | 2017.07.21 |
multi_grep (0) | 2017.07.21 |