This function estimates and returns parameters needed for spike-in count simulations using supplementary code from Kim et al. 2016 (DOI: 10.1038/ncomms9687).

estimateSpike(spikeData,
spikeInfo,
MeanFragLengths = NULL,
batchData = NULL,
Normalisation=c('depth','none'),
SampleFilter = 3,
RNAseq = c("bulk", "singlecell"),
Protocol = c('UMI', 'Read'),
verbose = TRUE)

Arguments

spikeData

is a count matrix. Rows correspond to spike-ins, columns to samples. rownames should contain the spike-in names, colnames the sample names.

spikeInfo

is a molecule count matrix of spike-ins. Rows correspond to spike-ins. The order of rows should be the same as the rows in spikeData. The rownames should be the same as the rownames of spikeData. The first column must contain the molecule counts of spike-ins and be named 'SpikeInput'. The second column can contain the sequence lengths of spike-ins in bases and be named 'Lengths'.

MeanFragLengths

is a numeric vector of the mean fragment length.

batchData

is a data.frame for batch annotation. Rows correspond to samples. The order of rows should be the same as the columns in spikeData. The rownames should be the same as the column names of spikeData. The first column should contain the batch annotation, e.g. 'a' for batch 1, 'b' for batch 2.

Normalisation

is a character value: 'depth' or 'none'. For more information, please consult the details section.

SampleFilter

is a numeric vector indicating the minimal number of MADs (median absolute deviation) away from the median number of features detected as well as sequencing depth across all samples so that outlying samples are removed prior to normalisation and parameter estimation. The default is 5, i.e. at least 5 MADs away from the median. Choose higher values if you want to filter out less samples. This parameter is particularly important for single cells to ensure reliable parameter estimation. For more information, please consult isOutlier.

RNAseq

is a character value: "bulk" or "singlecell".

Protocol

is a character value defining the type of counts given in countData. Options are "UMI" (e.g. 10X Genomics, CEL-seq2) or "Read" (e.g. Smart-seq2).

verbose

Logical value to indicate whether to print function information. Default is TRUE.

Value

List object with the following entries:

normCounts

The normalised spike-in read counts data.frame.

normParams

The mean and standard deviation per spike-in using normalised read counts in a data.frame.

CaptureEfficiency

The ad-hoc estimated as well as fitted detection probabilities with confidence interval per spike-in using normalised read counts in a data.frame.

seqDepth

Library size, i.e. total number of reads per library

sf

The estimated library size factors.

EVGammaThetaEstimates

Estimation of the four parameters capturing technical variability, namely E[\(\gamma\)], Var[\(\gamma\)], E[\(\theta\)] and Var[\(\theta\)]. For more details, please consult supplementary information of Kim et al. 2016 (DOI: 10.1038/ncomms9687). These estimates are needed for simulating spike-in read counts.

Input

The input spike-in expression matrix, molecule counts data.frame and batch annotation data.frame, filtered so that only spike-ins with nonzero expression and samples with at least 100 reads are retained.

Settings

Reporting the chosen normalisation framework.

Details

Normalisation methods

'depth'

applies the depth normalization method as implemented in computeSpikeFactors.

'none'

No normalisation is applied. This approach can be used for prenormalized expression estimates, e.g. TPM/FPKM/RPKM estimated by RSEM, salmon, cufflinks etc.

Examples

if (FALSE) { data("SmartSeq2_SpikeIns_Read_Counts") data("SmartSeq2_SpikeInfo") Batches = data.frame(Batch = sapply(strsplit(colnames(SmartSeq2_SpikeIns_Read_Counts), "_"), "[[", 1), stringsAsFactors = F, row.names = colnames(SmartSeq2_SpikeIns_Read_Counts)) # estimation spike_param <- estimateSpike(spikeData = SmartSeq2_SpikeIns_Read_Counts, spikeInfo = SmartSeq2_SpikeInfo, MeanFragLength = NULL, batchData = Batches, Normalisation = 'depth') # plotting plotSpike(estSpike = spike_param, Annot = FALSE) }