Galaxy |

lofreq call: call variants from BAM file

LoFreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It makes full use of base-call qualities and other sources of errors inherent in sequencing, which are usually ignored by other methods or only used for filtering.

LoFreq can run on almost any type of aligned sequencing data since no machine- or sequencing-technology dependent thresholds are used. It automatically adapts to changes in coverage and sequencing quality and can therefore be applied to a variety of data-sets e.g. viral/quasispecies, bacterial, metagenomics or somatic data.

While the tool will often give reasonable results with default settings a variety of options let you control its exact behavior. These advanced options can be subdivided into those affecting variant calling and those affecting posterior filtering of the results.

Variant calling paramters

At the heart of LoFreq's variant caller is a joint quality score that is computed for every site in every read (that survives filtering) and that combines some or all of the following read and base quality measures:

Base/indel quality

For any read, this is the Phred-scaled likelihood that the base mapped to a given site does not represent a sequencing error. For every base, this score got computed by the base caller of your sequencing platform and got incorporated into your input dataset during read alignment.

For insertions/deletions this is defined, analogously, as the Phred-scaled likelihood that any inserted/deleted base is real, however, you are responsible for adding indel qualitites, which are required for indel calling with lofreq, to your input.

For doing so, you can use lofreq indelqual.
Base/indel alignment quality

For any read, this is the Phred-scaled likelihood that the read's base or indel mapped to a given reference genome position is mapped to this position correctly.

The tool can calculate these scores for you on the fly. Alternatively, you can precalculate them using lofreq alnqual, which will incorporate them into your input dataset.
Mapping quality

The Phred-scaled likelihood that the read got mapped to the correct place in the reference genome. This score got incorporated into your input dataset by the aligner you used to map your reads.
Source quality

This is the Phred-scaled likelihood that the given read comes from the reference genome. The tool can calculate this score for you.

Variant filter parameters

After generating a list of called variants, the tool can filter this list based on:

the statistical significance of the variant calls
strand-bias of reads supporting the variant
coverage of the variant site

While posterior filtering can help reduce false-positive variant calls, please note that the separate lofreq filter, which can be run on the output of lofreq call has many more options for configuring filters.

These are the different filter settings supported by the tool:

Preset filtering on QUAL score + coverage + strand bias

For variants to pass this filter, the following is required:

statistical signficance of the variant call with a pvalue < 0.01 based on the retransformed QUAL score of the variant and multiple-testing corrected using a dynamically determined Bonferroni factor (based on the number of overall variants considered during calling).
A strand-bias in supporting reads not significant under a FDR-corrected p value of 0.001 and 85% of supporting reads mapped to the same strand of the genome.
A coverage of the variant site of at least 10x.

Preset QUAL score-based filtering

Same QUAL-based significance filter as the default, but without the strand-bias and coverage criteria

Strictly no filtering

Do not apply any filters, but produce the original list of all called variants. You will almost always want to use lofreq filter to process the resulting output.

Custom filter settings/combinations

Lets you define your own QUAL-based significance filter and, optionally, combine it with the default starnd-bias and coverage filters.