Galaxy | Tool Preview

Strelka Somatic (version 2.9.10+galaxy0)
In bam or cram format.
In bam or cram format.
(--referenceFasta)
Expert configuration of calling models
Expert configuration of calling model 0
Whitelists of SNV/indel sites that should always be considereds
Whitelists of SNV/indel sites that should always be considered 0
Output options
Output options 0
Strelka run configurations
Strelka run configuration 0

What it does

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts (Strelka Germline) and somatic variation in tumor/normal sample pairs (Strelka Somatic).

Strelka accepts input read mappings from BAM or CRAM files, and optionally candidate and/or forced-call alleles from VCF. It reports all small variant predictions in VCF 4.1 format. Germline variant reporting uses the gVCF conventions to represent both variant and reference call confidence. For best somatic indel performance, Strelka is designed to be run with the Manta structural variant and indel caller, which provides additional indel candidates up to a given maxiumum indel size (by default this is 49). By design, Manta and Strelka run together with default settings provide complete coverage over all indel sizes (in additional to all SVs and SNVs) for clinical somatic and germline analysis scenarios.

The somatic calling model improves on the original Strelka method for liquid and late-stage tumor analysis by accounting for possible tumor cell contamination in the normal sample. A final empirical variant re-scoring step using random forest models trained on various call quality features has been added to both callers to further improve precision.

Input

Sequencing Data

The input sequencing reads are expected to come from a paired-end sequencing assay. Any input other than paired-end reads are ignored by default except to double-check for putative somatic variant evidence in the normal sample during somatic variant analysis. Read lengths above ~400 bases are not tested.

Alignment Files

All input sequencing reads should be mapped by an external tool and provided as input in BAM. or CRAM format.

The following limitations apply to the input BAM/CRAM alignment records:

VCF Files

Input VCF files are accepted for a number of roles as described below. All input VCF records are checked for compatibility with the given reference genome, in additional to role-specific checks described below. If any VCF record's REF field is not compatible with the reference genome a runtime error will be triggered. 'Compatible with the reference genome' means that each VCF record's REF base either (1) matches the corresponding reference genome base or the VCF record's REF base is 'N' or the reference genome base is any ambiguous IUPAC base code (all ambiguous base codes are converted to 'N' while importing the reference).

Output

INDEL

All somatic indels inferred in the tumor sample in VCF format.

SNVS

All somatic SNVs inferred in the tumor sample in VCF format.

Callability

The somatic variant caller can be configured with the option --outputCallableRegions, which will extend the somatic SNV quality model calculation to be applied as a test of somatic SNV callability at all positions in the genome. The outcome of this callability calculation will be summarized in a BED-formatted callability track. This BED track contains regions which are determined to be callable, indicating that there is sufficient evidence to either call a somatic SNV or assert the absence of a somatic SNV with a variant frequency of 10% or greater. Both somatic and non-somatic sites are determined to be 'callable' if the somatic or non-somatic quality threshold is at least 15.

References

More information are available on github.