Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference).
Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational "barcodes" derived from the UShER global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.
Freyja is intended as a post-processing step after primer trimming and variant calling in iVar (Grubaugh and Gangavaparu et al., 2019). From measurements of SNV freqency and sequencing depth at each position in the genome, Freyja returns an estimate of the true lineage abundances in the sample.
Freyja demix estimates lineage abundances in a potentially multi-lineage input sample.
The tool requires as input a dataset with called variants and a dataset with genome-wide sequencing depth information. Both types of data can be produced with Freyja call, but the tool accepts variant calls also in VCF format.
For single samples it is recommended to select "Specify sample name explicitly" under "Set sample name".
To use this tool on multiple samples in parallel, please provide two collections in the same sample sort order - one with the variant calls, the other one with the sequencing depths - and select "Autodetect sample name", which will use collection element identifiers as the names of the samples. This will produce a new collection of demixing reports that can be passed to Freyja: Aggregate and visualize with sample names preserved.
Selection of multiple regular called variants and depth datasets is discouraged since proper dataset pairing cannot be guaranteed!
The tool produces tabular output that includes the lineages detected in the sample, their corresponding abundances, and a lineage summary by constellation.
Example output:
summarized | [('Delta', 0.65), ('Other', 0.25), ('Alpha', 0.1')] |
lineages | ['B.1.617.2' 'B.1.2' 'AY.6' 'Q.3'] |
abundances | "[0.5 0.25 0.15 0.1]" |
resid | 3.14159 |
coverage | 95.8 |
Where summarized denotes a sum of all lineage abundances in a particular WHO designation (i.e. B.1.617.2 and AY.6 abundances are summed in the above example), otherwise they are grouped into "Other". The lineage array lists the identified lineages in descending order, and abundances contains the corresponding abundances estimates. The value of resid corresponds to the residual of the weighted least absolute devation problem used to estimate lineage abundances. The coverage value provides the 10x coverage estimate (percent of sites with 10 or greater reads- 10 is the default but can be modfied using the --covcut option).