Galaxy |

scPipe (version 1.0.0+galaxy2)

FASTQs or BAM:

Select the format of the input sample

Reference genome FASTA:

Select a built-in FASTA:

If your genome of interest is not listed, contact your Galaxy administrator

Paired reads or Paired collection:

Input Read 1:

Read 1 should contain the transcripts in fastq.gz format

Input Read 2:

Read 2 should contain UMI and barcodes in fastq.gz format

Cell barcodes file:

Optional file of cell barcodes. If not provied the barcodes will be detected from the reads. Should contain at least two columns, where the first column has the cell id and the second column contains the barcode sequence.

Exon annotation GFF3 file:

Current supported source is ENSEMBL

Species gene id:

This must be in biomaRt ENSEMBL listDatasets() format e.g. hsapiens_gene_ensembl. See the biomaRt user guide here: https://www.bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/doc/biomaRt.html

Barcode start Read 1:

Barcode start position in Read 1. Positions are 0-indexed so the first base is considered base 0, -1 indicates no barcode. Default: -1

Barcode length Read 1:

Barcode length in Read 1, 0 if no barcode present. Default: 0

Barcode start Read 2:

Barcode start position in Read 2. Positions are 0-indexed so the first base is considered base 0, -1 indicates no barcode. Default: 6

Barcode length Read 2:

Barcode length in Read 2, 0 if no barcode present. Default: 8

UMI start Read 2:

UMI start position in Read 2. Positions are 0-indexed so the first base is considered base 0, -1 indicates no UMI. Default: 0

UMI length Read 2:

UMI length in Read 2, 0 if no UMI present. Default: 6

Keep outliers?:

If this option is set to Yes, outlier cells will not be removed from the gene count matrix. Default: No

Output Options

Output Options 0

Advanced Options

Advanced Options 0

What it does

scPipe is an R/Bioconductor package that integrates barcode demultiplexing, read alignment, UMI-aware gene-level quantification and quality control of raw sequencing data generated by multiple protocols that include CEL-seq, MARS-seq, Chromium 10X, Drop-seq and Smart-seq. scPipe produces a count matrix that is essential for downstream analysis along with QC metrics and a HTML report that summarises data quality. These results can be used as input for downstream analyses including normalization, visualization and statistical testing. The scPipe workflow is described in this vignette and examples of the report output can be found here. Note that outlier cells are detected and removed by default but they can be kept if "Keep outliers?" is selected.

Inputs

Either

Reference genome in FASTA format
Paired-end FASTQ.GZ reads
Cell barcodes TAB-separated file (Optional)

OR

BAM file
Cell barcodes TAB-separated file

AND

Exon annotation in ENSEMBL GFF3 format

Read Structure

The default read structure represents CEL-seq paired-ended reads, with one cell barcode in Read 2 Start from 6bp and UMI sequence in Read 2 Start from the first bp. So the read structure will be : bs1=-1, bl1=0, bs2=6, bl2=8, us=0, ul=6. bs1=-1, bl1=0 means we don't have index in Read 1 so we set a negative value to start position and zero to the length. bs2=6, bl2=8 means we have index in Read 2 which starts at 6bp with 8bp length. us=0, ul=6 means we have UMI from the start of Read 2 and the length is 6bp. NOTE: the zero based index is used so the index of the sequence starts from zero. For a typical Drop-seq experiment the setting will be bs1=-1, bl1=0, bs2=0, bl2=12, us=12, ul=8, which means Read 1 only contains transcript and the first 12bp in Read 2 are index, followed by 8bp UMIs.

Outputs

Count matrix of genes in Tabular format

Optionally you can choose to output

PDF of QC Plots (default is Yes)

QC metrics matrix

HTML report (if FASTQs are input)

Rscript

RData