Salmon is a lightweight method for quantifying transcript abundance from RNA–seq reads, combining a dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure.
The salmon package contains 4 tools:
- Index: creates a salmon index
- Quant: quantifies a sample (Reads or mapping-based)
- Alevin: Single-cell analysis
- Quantmerge: Merges multiple quantifications into a single file
Galaxy divides these four into three separate tools in the IUC toolshed:
- Salmon quant
- Salmon quantmerge
- Alevin
Alevin is a tool — integrated with the salmon software — that introduces a family of algorithms for quantification and analysis of 3’ tagged-end single-cell sequencing data. Currently alevin supports the following two major droplet based single-cell protocols:
- Drop-seq
- 10x-Chromium v1/2/3
Alevin works under the same indexing scheme (as salmon) for the reference, and consumes the set of FASTA/Q files(s) containing the Cellular Barcode(CB) + Unique Molecule identifier (UMI) in one read file and the read sequence in the other. Given just the transcriptome and the raw read files, alevin generates a cell-by-gene count matrix (in a fraction of the time compared to other tools).
Alevin works in two phases. In the first phase it quickly parses the read file containing the CB and UMI information to generate the frequency distribution of all the observed CBs, and creates a lightweight data-structure for fast-look up and correction of the CB. In the second round, alevin utilizes the read-sequences contained in the files to map the reads to the transcriptome, identify potential PCR/sequencing errors in the UMIs, and performs hybrid de-duplication while accounting for UMI collisions. Finally, a post-abundance estimation CB whitelisting procedure is done and a cell-by-gene count matrix is generated.
For further information regarding the tool and its optional parameters, visit the Alevin and Salmon wikis.