Galaxy | Tool Preview

VarScan somatic (version 2.4.3.6)
The fasta reference genome that variants should be called against.
Compatibility options for experts
Compatibility options for experts 0

VarScan Overview

VarScan performs variant detection for massively parallel sequencing data, such as exome, WGS, and transcriptome data. Full documentation of the command line package is available here.

The Varscan Somatic tool for Galaxy

This tool wraps the functionality of the varscan somatic and the varscan fpfilter command line tools.

The tool is designed to detect genetic variants in a pair of samples representing normal and tumor tissue from the same individual. It classifies the variants, according to their most likely origin, as somatic (variant is found in the tumor, but not in the normal sample, i.e., is the consequence of a somatic mutation event), germline (variant is found in both samples => germline mutation event) and LOH (variant is found in both samples, but only the tumor sample appears to be homozygous for it => loss of heterozygosity event). This classification is encoded in the variant INFO fields of the VCF output produced by the tool in the form of a status code SS (somatic status), where:

In addition, SS=0 indicates a possible variant, but with insufficient evidence for an, at least, heterozygous state in either individual sample, and SS=5 is used for variants of unexplained origin (e.g., variants found in the normal, but not in the tumor tissue sample).

In a second step, following variant calling, the tool can try to detect likely false-positive calls by re-inspecting the data at the variant sites more carefully and looking for signs that may indicate problems with the sequencing data or its mapping. If a called variant is deemed a possible false-positive at this step, this gets indicated in the FILTER field of the variant record in the VCF output. For high confidence variants passing all posterior (applied after variant calling) filters the value of the field will be PASS, for variants failing any of the posterior filters the value will be a ;-separated list of the problematic filters.

Input

The tool takes as input a reference genome (in fasta format) and a pair of aligned reads datasets (bam format).

Output

A VCF dataset of called variants. When asked to Generate separate output datasets for SNP and indel calls, the tool will behave like the varscan somatic command line tool and produce two VCF datasets - one with just the single nucleotide variants, while the other one will store insertion/deletion variants.

Options

Estimated purity of normal sample / of tumor sample

Since, in practice, it is often impossible to isolate tissue samples without contamination from surrounding tissue or from invading cells, these two fields let you indicate your estimate of the purity of the two samples (as fractions between 0 and 1, where 1 would indicate a contamination-free sample and 0.5 a sample to which the desired tissue contributes only 50%, while the other 50% consist of cells from the other tissue type).

Settings for Variant Calling

Settings in this section will affect the steps of variant calling and classification. You can accept VarScan's default values for the corresponding parameters or customize them according to your needs.

Settings for Posterior Variant Filtering

Use the parameters in this section to configure the false-positive filtering step that follows variant calling and classification. These settings will not influence the number of variants detected nor their classification, but may change the FILTER field of variant records to indicate which variants failed to pass certain filters. You can use this information with downstream tools to exclude certain variants from further analysis steps or include only high confidence variants that passed all filters (those with PASS as their INFO field value. You can accept the orignal filter defaults of the varscan fpfilter command line tool, use the settings established for the tool in the DREAM3 challenge, or choose to customize the settings. Alternatively, you can also choose to skip posterior filtering entirely, in which case all variants will have their INFO field set to PASS.