IMPORTANT: This wrapper constitutes a copy of the developed by the IUC (https://github.com/galaxyproject/tools-iuc/blob/master/tools/gatk4/gatk4_Mutect2.xml). We have introduced some changes for including the BAM files from both normal and tumor samples. However, until these changes are approved, we use this tool by working with a copy of the original wrapper.
Call somatic short variants via local assembly of haplotypes. Short variants include single nucleotide (SNV) and insertion and deletion (indel) variants. The caller combines the DREAM challenge-winning somatic genotyping engine of the original MuTect (Cibulskis et al., 2013) with the assembly-based machinery of HaplotypeCaller.
This tool is featured in the Somatic Short Mutation calling Best Practice Workflow. See Tutorial#11136 for a step-by-step description of the workflow and Article#11127 for an overview of what traditional somatic calling entails. For the latest pipeline scripts, see the Mutect2 WDL scripts directory. Although we present the tool for somatic calling, it may apply to other contexts, such as mitochondrial variant calling.
Example commands show how to run Mutect2 for typical scenarios. The two modes are (i) somatic mode where a tumor sample is matched with a normal sample in analysis and (ii) tumor-only mode where a single sample's alignment data undergoes analysis.
Given a matched normal, Mutect2 is designed to call somatic variants only. The tool includes logic to skip emitting variants that are clearly present in the germline based on provided evidence, e.g. in the matched normal. This is done at an early stage to avoid spending computational resources on germline events. If the variant's germline status is borderline, then Mutect2 will emit the variant to the callset for subsequent filtering and review.
gatk Mutect2 -R reference.fa -I tumor.bam -tumor tumor_sample_name -I normal.bam -normal normal_sample_name --germline-resource af-only-gnomad.vcf.gz --af-of-alleles-not-in-resource 0.00003125 --panel-of-normals pon.vcf.gz -O somatic.vcf.gz
The --af-of-alleles-not-in-resource argument value should match expectations for alleles not found in the provided germline resource. Note the tool does not require a germline resource nor a panel of normals (PoN) to run. The tool prefilters sites for the matched normal and the PoN. For the germline resource, the tool prefilters on the allele. Below is an excerpt of a known variants resource with population allele frequencies
#CHROM POS ID REF ALT QUAL FILTER INFO 1 10067 . T TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC 30.35 PASS AC=3;AF=7.384E-5 1 10108 . CAACCCT C 46514.32 PASS AC=6;AF=1.525E-4 1 10109 . AACCCTAACCCT AAACCCT,* 89837.27 PASS AC=48,5;AF=0.001223,1.273E-4 1 10114 . TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTA *,CAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTA,T 36728.97 PASS AC=55,9,1;AF=0.001373,2.246E-4,2.496E-5 1 10119 . CT C,* 251.23 PASS AC=5,1;AF=1.249E-4,2.498E-5 1 10120 . TA CA,* 14928.74 PASS AC=10,6;AF=2.5E-4,1.5E-4 1 10128 . ACCCTAACCCTAACCCTAAC A,* 285.71 PASS AC=3,1;AF=7.58E-5,2.527E-5 1 10131 . CT C,* 378.93 PASS AC=7,5;AF=1.765E-4,1.261E-4 1 10132 . TAACCC *,T 18025.11 PASS AC=12,2;AF=3.03E-4,5.049E-5
This mode runs on a single sample, e.g. single tumor or single normal sample. To create a PoN, call on each normal sample in this mode, then use CreateSomaticPanelOfNormals to generate the PoN.
gatk Mutect2 -R reference.fa -I sample.bam -tumor sample_name -O single_sample.vcf.gz
Additional parameters that factor towards filtering, including normal-artifact-lod (default threshold 0.0) and tumor-lod (default threshold 5.3), are available in FilterMutectCalls. While the tool calculates normal-lod assuming a diploid genotype, it calculates normal-artifact-lod with the same approach it uses for tumor-lod, i.e. with a variable ploidy assumption.
If a variant is absent from a given germline resource, then the value for --af-of-alleles-not-in-resource applies. For example, gnomAD's 16,000 samples (~32,000 homologs per locus) becomes a probability of one in 32,000 or less. Thus, an allele's absence from the germline resource becomes evidence that it is not a germline variant.
Although GATK4 Mutect2 accomodates varying coverage depths, further optimization of parameters may improve calling for extreme high depths, e.g. 1000X.