Galaxy | Tool Preview

Variant Annotator (version 1.3.2)
in percent
in reads (per strand)

What it does

This tool parses variant counts from a special VCF file. It counts simple variants, calculates numbers of alleles, and calculates minor allele frequency. It can apply filters based on coverage, strand bias, and minor allele frequency cutoffs.


Input Format

Note: variants that are not A/C/G/T SNVs will be ignored!

The input VCF should be like the output of the Naive Variant Detector tool (using the stranded option). The sample column(s) must give the read count for each variant on each strand. Below is an example of a valid sample column entry (the important part is after the last colon):

0/0:1:0.02:+T=27,+G=1,-T=22,

Output

Each row represents one site in one sample. For unstranded output, 13 fields give information about that site:

1.  SAMPLE  - Sample name (from VCF sample column labels)
2.  CHR     - Chromosome of the site
3.  POS     - Chromosomal coordinate of the site
4.  A       - Number of reads supporting an 'A'
5.  C       - 'C' reads
6.  G       - 'G' reads
7.  T       - 'T' reads
8.  CVRG    - Total (number of reads supporting one of the four bases above)
9.  ALLELES - Number of qualifying alleles
10. MAJOR   - Major allele
11. MINOR   - Minor allele (2nd most prevalent variant)
12. MAF     - Frequency of minor allele
13. BIAS    - Strand bias measure

For stranded output, instead of using 4 columns to report read counts per base, 8 are used to report the stranded counts per base:

1       2   3   4  5  6  7  8  9 10 11  12    13     14    15   16   17
SAMPLE CHR POS +A +C +G +T -A -C -G -T CVRG ALLELES MAJOR MINOR MAF BIAS

Example

Below is a header line, followed by some example data lines. Since the input contained three samples, the data for each site is reported on three consecutive lines. However, if a sample fell below the coverage threshold at that site, the line will be omitted:

#SAMPLE  CHR    POS  A   C    G    T  CVRG  ALLELES  MAJOR  MINOR  MAF      BIAS
BLOOD_1  chr20  99   0   101  1    2  104   1        C      T      0.01923  0.33657
BLOOD_2  chr20  99   82  44   0    1  127   2        A      C      0.34646  0.07823
BLOOD_3  chr20  99   0   110  1    0  111   1        C      G      0.009    1.00909
BLOOD_1  chr20  100  3   5    100  0  108   1        G      C      0.0463   0.15986
BLOOD_3  chr20  100  1   118  11   0  130   0        C      G      0.08462  0.04154

Site printing and allele tallying requirements

Coverage threshold:

If a coverage threshold is used, the number of reads on each strand must be at or above the threshold. If either strand is below the threshold, the line will be omitted. N.B. this means the total coverage for each printed site will be at least twice the number you give in the "coverage threshold" option. Also, since only simple variants are counted, a site with 100 reads, all supporting a deletion variant, would not be printed.

Frequency threshold:

If a frequency threshold is used, alleles are only counted (in the ALLELES column) if they meet or exceed this minor allele frequency threshold.

Strand bias:

The alleles passing the threshold on each strand must match (though not in order), or the allele count will be 0. So a site with A, C, G on the plus strand and A, G on the minus strand will get an allele count of zero, though the (strand-independent) major allele, minor allele, and minor allele frequency will still be reported. If there is a tie for the minor allele, one will be randomly chosen.

Additionally, a measure of strand bias is given in the last column. This is calculated using the method of Guo et al., 2012. A value of "." is given when there is no valid result of the calculation due to a zero denominator. This occurs when there are no reads on one of the strands, or when there is no minor allele.