lofreq call: call variants from BAM file
LoFreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It makes full use of base-call qualities and other sources of errors inherent in sequencing, which are usually ignored by other methods or only used for filtering.
LoFreq can run on almost any type of aligned sequencing data since no machine- or sequencing-technology dependent thresholds are used. It automatically adapts to changes in coverage and sequencing quality and can therefore be applied to a variety of data-sets e.g. viral/quasispecies, bacterial, metagenomics or somatic data.
While the tool will often give reasonable results with default settings a variety of options let you control its exact behavior. These advanced options can be subdivided into those affecting variant calling and those affecting posterior filtering of the results.
Variant calling paramters
At the heart of LoFreq's variant caller is a joint quality score that is computed for every site in every read (that survives filtering) and that combines some or all of the following read and base quality measures:
Base/indel quality
For any read, this is the Phred-scaled likelihood that the base mapped to a given site does not represent a sequencing error. For every base, this score got computed by the base caller of your sequencing platform and got incorporated into your input dataset during read alignment.
For insertions/deletions this is defined, analogously, as the Phred-scaled likelihood that any inserted/deleted base is real, however, you are responsible for adding indel qualitites, which are required for indel calling with lofreq, to your input.
For doing so, you can use lofreq indelqual.
Base/indel alignment quality
For any read, this is the Phred-scaled likelihood that the read's base or indel mapped to a given reference genome position is mapped to this position correctly.
The tool can calculate these scores for you on the fly. Alternatively, you can precalculate them using lofreq alnqual, which will incorporate them into your input dataset.
Mapping quality
The Phred-scaled likelihood that the read got mapped to the correct place in the reference genome. This score got incorporated into your input dataset by the aligner you used to map your reads.
Source quality
This is the Phred-scaled likelihood that the given read comes from the reference genome. The tool can calculate this score for you.
Variant filter parameters
After generating a list of called variants, the tool can filter this list based on:
While posterior filtering can help reduce false-positive variant calls, please note that the separate lofreq filter, which can be run on the output of lofreq call has many more options for configuring filters.
These are the different filter settings supported by the tool:
Preset filtering on QUAL score + coverage + strand bias
For variants to pass this filter, the following is required:
Preset QUAL score-based filtering
Same QUAL-based significance filter as the default, but without the strand-bias and coverage criteria
Strictly no filtering
Do not apply any filters, but produce the original list of all called variants. You will almost always want to use lofreq filter to process the resulting output.
Custom filter settings/combinations
Lets you define your own QUAL-based significance filter and, optionally, combine it with the default starnd-bias and coverage filters.