R/Evaluate.R
evaluateDE.Rd
This function takes the simulation output from simulateDE
and computes quantities of the confusion matrix for statistical power evaluation.
evaluateDE(simRes, alpha.type=c("adjusted","raw"), MTC=c('BY', 'BH', 'holm', 'hochberg', 'hommel', 'bonferroni', 'Storey', 'IHW'), alpha.nominal=0.1, stratify.by=c("mean", "dispersion", "dropout", "lfc"), strata.probs = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), filter.by=c("none", "mean", "dispersion", "dropout"), strata.filtered=1, target.by=c("lfc", "effectsize"), delta=0, Table = TRUE)
simRes | The result from |
---|---|
alpha.type | A string to represent the way to call DE genes.
Available options are |
MTC | Multiple testing correction method to use. Available options are
1) see p.adjust.methods,
2) Storey's qvalue see qvalue and
3) Independent Hypothesis Weighting considering mean expression as covariate (see ihw).
Default is |
alpha.nominal | The nomial level of significance. Default is 0.1. |
stratify.by | A string to represent the way to stratify genes.
Available options are |
strata.probs | A vector specifying the probability values for sample quantiles of the strata. See qvalue. |
filter.by | A string to represent the way to filter genes.
This is used in conjunction with strata.filtered for gene filtering.
Available options are |
strata.filtered | The strata to be filtered out in computing error matrix-related quantities. Genes falling into these strata will be excluded. See "Details" for more description of gene filtering. |
target.by | A string to specify the method to define "biologically important" DE genes.
Available options are (1) |
delta | A threshold used for defining "biologically important" genes.
Genes with absolute log2 fold changes (when target.by is "lfc")
or effect sizes (when target.by is "effectsize") greater than this value
are deemed DE in error rates calculations. If |
Table | A logical vector. If the default |
A list with the following entries:
3D array representing the number of true negatives, true positives, false positives, false negatives and their proportions/rates as well as false discovery rate for all simulation settings. The dimension of the arrays are nstrata * N * nsims. Here nstrata is number of specified strata. N is number of different sample sizes settings, and nsims is number of simulations.
Matrix representing the number of true negatives, true positives, false positives, false negatives for all simulation settings. The dimension of the matrices are N * nsims. Here N is number of different sample sizes settings, and nsims is number of simulations.
Matrix representing the marginal rates for all simulation settings. The dimension of the matrices are N * nsims.
Number of genes per stratum and number of DE genes per stratum.
The input "stratify.by".
The input strata.
Sample sizes per group. This is taken from the simulation options.
The input method to define "biologically important" DE genes, either by log fold change or effect size.
The input delta for biologically important genes. If delta=0, all target.by will be considered.
This is the main function to compute various power-related quantities, using stratification and filtering.
We recommend to compute and visualize error rates (especially TPR) conditional on expression characteristics like mean, dispersion and/or dropout rate. It is likely that the power to detect DE genes is strongly dependent on mean expression levels even though the magnitude of effect sizes is the same. The stratified results will provide a more comprehensive power assessment and better guide the investigators in experimental designs and analysis strategies.
Sometimes it is advisible to filter out some genes (such as the ones with very low mean expression) before DE detection. The filtering option here provides an opportunity to compare the rates before and after filtering.
We provide two options to define biologically interesting genes: by absolute values of log fold changes or effect sizes (absolute values of log fold changes divided by the square root of 1/(mean+dispersions)). Genes with these quantities over a threshold are deemed interesting, and the rate calculations are based on these genes.
estimateParam
for negative binomial parameters,
Setup
for setting up simulation parameters and
simulateDE
for simulating differential expression and
plotEvalDE
for visualisation.
if (FALSE) { # estimate gene parameters data("Bulk_Read_Counts") data("GeneLengths_hg19") estparam_gene <- estimateParam(countData = Bulk_Read_Counts, readData = NULL, batchData = NULL, spikeData = NULL, spikeInfo = NULL, Lengths = GeneLengths_hg19, MeanFragLengths = NULL, RNAseq = 'bulk', Protocol = 'Read', Distribution = 'NB', Normalisation = "MR", GeneFilter = 0.25, SampleFilter = 3, sigma = 1.96, NCores = NULL, verbose = TRUE) # define log fold change p.lfc <- function(x) sample(c(-1,1), size=x,replace=T)*rgamma(x, shape = 2, rate = 2) # set up simulations setupres <- Setup(ngenes = 10000, nsims = 10, p.DE = 0.1, pLFC = p.lfc, n1 = c(3,6,12), n2 = c(3,6,12), Thinning = c(1,0.9,0.8), LibSize = 'given', estParamRes = estparam_gene, estSpikeRes = NULL, DropGenes = FALSE, sim.seed = 4379, verbose = TRUE) # run simulation simres <- simulateDE(SetupRes = setupres, Prefilter = NULL, Imputation = NULL, Normalisation = 'MR', Label = 'none', DEmethod = "limma-trend", DEFilter = FALSE, NCores = NULL, verbose = TRUE) # DE evaluation evalderes <- evaluateDE(simRes = simres, alpha.type="adjusted", MTC='BH', alpha.nominal=0.05, stratify.by = "lfc", filter.by = "mean", strata.filtered = 1, target.by = "lfc", delta = 0.25) plotEvalDE(evalRes = evalderes, rate = "marginal") plotEvalDE(evalRes = evalderes, rate = "conditional") }