What it does
scPipe is an R/Bioconductor package that integrates barcode demultiplexing, read alignment, UMI-aware gene-level quantification and quality control of raw sequencing data generated by multiple protocols that include CEL-seq, MARS-seq, Chromium 10X, Drop-seq and Smart-seq. scPipe produces a count matrix that is essential for downstream analysis along with QC metrics and a HTML report that summarises data quality. These results can be used as input for downstream analyses including normalization, visualization and statistical testing. The scPipe workflow is described in this vignette and examples of the report output can be found here. Note that outlier cells are detected and removed by default but they can be kept if "Keep outliers?" is selected.
Inputs
Read Structure
The default read structure represents CEL-seq paired-ended reads, with one cell barcode in Read 2 Start from 6bp and UMI sequence in Read 2 Start from the first bp. So the read structure will be : bs1=-1, bl1=0, bs2=6, bl2=8, us=0, ul=6. bs1=-1, bl1=0 means we don't have index in Read 1 so we set a negative value to start position and zero to the length. bs2=6, bl2=8 means we have index in Read 2 which starts at 6bp with 8bp length. us=0, ul=6 means we have UMI from the start of Read 2 and the length is 6bp. NOTE: the zero based index is used so the index of the sequence starts from zero. For a typical Drop-seq experiment the setting will be bs1=-1, bl1=0, bs2=0, bl2=12, us=12, ul=8, which means Read 1 only contains transcript and the first 12bp in Read 2 are index, followed by 8bp UMIs.
Outputs
- Count matrix of genes in Tabular format
Optionally you can choose to output
- PDF of QC Plots (default is Yes)
- QC metrics matrix
- HTML report (if FASTQs are input)
- Rscript
- RData