This function estimates and returns parameters needed for power simulations.
The user needs to choose the following options at least: specify a gene expression matrix; the type of RNA-seq experiment, i.e. bulk or single cell; the recommended distribution is negative binomial (NB) except for single-cell full-length Smart-seq2 read data where we recommend zero-inflated NB (ZINB); the preferred normalisation method, we recommend scran for single cell and TMM or MR for bulk.
The other parameters are optional (additional data) or have preset values (gene and sample filtering). Please consult the detailed arguments description.
estimateParam(countData, readData = NULL, batchData = NULL, spikeData = NULL, spikeInfo = NULL, Lengths = NULL, MeanFragLengths = NULL, RNAseq = c('bulk', 'singlecell'), Protocol = c('UMI', 'Read'), Distribution = c('NB', 'ZINB'), Normalisation = c("TMM", "MR", "PosCounts", "UQ", "scran", "Linnorm", "sctransform", "SCnorm", "Census", "depth", "none"), GeneFilter = 0.05, SampleFilter = 5, sigma = 1.96, NCores = NULL, verbose=TRUE)
countData | is a (UMI) count |
---|---|
readData | is a the matching read count |
batchData | is a |
spikeData | is a count |
spikeInfo | is a molecule count |
Lengths | is a numeric vector of transcript lengths with the same length and order as the rows in countData. This variable is only needed for internal gene length corrections (TPM), see Details. |
MeanFragLengths | is a numeric vector of mean fragment lengths with the same length as columns in countData. This variable is only needed for internal gene length corrections (TPM), see Details. |
RNAseq | is a character value: "bulk" or "singlecell". |
Protocol | is a character value defining the type of counts given in |
Distribution | is a character value: "NB" for negative binomial or "ZINB" for zero-inflated negative binomial distribution fitting. |
Normalisation | is a character value: 'TMM', 'MR', 'PosCounts', 'UQ', 'scran', 'Linnorm', 'SCnorm', 'Census', 'depth', 'none'. For more information, please consult the Details section. |
GeneFilter | is a numeric vector indicating the minimal proportion of nonzero expression values
for a gene across all samples to be considered expressed and used for normalisation and parameter estimation.
The default is |
SampleFilter | is a numeric vector indicating the minimal number of MADs (median absolute deviation)
away from the median number of features detected as well as sequencing depth across all samples
so that outlying samples are removed prior to normalisation and parameter estimation.
The default is |
sigma | The variability band width for mean-dispersion loess fit defining the prediction interval for read count simulation. Default is 1.96, i.e. 95% interval. For more information see |
NCores | The number of cores for normalisation method SCnorm and Census.
The default |
verbose | Logical value to indicate whether to print function information.
Default is |
List object with the following entries:
A list object containing the estimated moments for the full, dropped out genes, dropped out samples and filtered normalized count matrix. For more information please consult the details section and the plot made with plotParam
.
A list object containing the fitting results of the mean-dispersion and mean-dropout relation as well as the estimated parameter data used for the fits. For more information please consult the details section and the plot made with plotParam
.
Number of samples and genes provided with at least one read count.
List object containing logical vectors for gene and sample dropouts after applying gene frequency and sample outlier filtering.
The estimated library size factor per sample.
The chosen parameters settings.
Normalisation Methods
employ the edgeR style normalization of weighted trimmed mean of M-values and upperquartile
as implemented in calcNormFactors
, respectively.
employ the DESeq2 style normalization of median ratio method and a modified geometric mean method
as implemented in estimateSizeFactors
, respectively.
apply the deconvolution and quantile regression normalization methods developed for sparse RNA-seq data
as implemented in calculateSumFactors
and SCnorm
, respectively. Spike-ins can also be supplied for both methods via spikeData
. Note, however that this means for scran that the normalisation as implemented in computeSpikeFactors
is also applied to genes (general.use=TRUE
).
apply the normalization method for sparse RNA-seq data
as implemented in Linnorm.Norm
.
For Linnorm
, the user can also supply spikeData
.
apply the normalization method developed for single-cell
UMI RNA-seq data as implemented in vst
.
converts relative measures of TPM/FPKM values into mRNAs per cell (RPC) without the need of spike-in standards.
Census at least needs Lengths
for single-end data and preferably MeanFragLengths
for paired-end data.
The authors state that Census should not be used for UMI data.
Sequencing depth normalisation.
No normalisation is applied. This approach can be used for prenormalized expression estimates, e.g. cufflinks, RSEM or salmon.
if (FALSE) { # Single Cells data("SmartSeq2_Gene_Read_Counts") Batches <- data.frame(Batch = sapply(strsplit(colnames(SmartSeq2_Gene_Read_Counts), "_"), "[[", 1), stringsAsFactors = FALSE, row.names = colnames(SmartSeq2_Gene_Read_Counts)) data("GeneLengths_mm10") estparam <- estimateParam(countData = SmartSeq2_Gene_Read_Counts, readData = NULL, batchData = Batches, spikeData = SmartSeq2_SpikeIns_Read_Counts, spikeInfo = SmartSeq2_SpikeInfo, Lengths = GeneLengths, MeanFragLengths = NULL, RNAseq = 'singlecell', Protocol = 'Read', Distribution = 'ZINB', Normalisation = "scran", GeneFilter = 0.1, SampleFilter = 3, sigma = 1.96, NCores = NULL, verbose = TRUE) # Bulk data("Bulk_Read_Counts") data("GeneLengths_hg19") estparam <- estimateParam(countData = Bulk_Read_Counts, readData = NULL, batchData = NULL, spikeData = NULL, spikeInfo = NULL, Lengths = GeneLengths_hg19, MeanFragLengths = NULL, RNAseq = 'bulk', Protocol = 'Read', Distribution = 'NB', Normalisation = "MR", GeneFilter = 0.1, SampleFilter = 3, sigma = 1.96, NCores = NULL, verbose = TRUE) # plot the results of estimation plotParam(estparam, Annot = FALSE) }