comparison allele-counts.xml @ 6:df3b28364cd2

allele-counts.{py,xml}: Add strand bias, documentation updates.
author nicksto <nmapsy@gmail.com>
date Wed, 09 Dec 2015 11:20:51 -0500
parents 31361191d2d2
children a72277535a2c
comparison
equal deleted inserted replaced
5:31361191d2d2 6:df3b28364cd2
1 <tool id="allele_counts_1" version="1.1" name="Variant Annotator"> 1 <tool id="allele_counts_1" version="1.2" name="Variant Annotator">
2 <description> process variant counts</description> 2 <description> process variant counts</description>
3 <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header $stranded $nofilt</command> 3 <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header $stranded $nofilt -r $seed</command>
4 <inputs> 4 <inputs>
5 <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/> 5 <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/>
6 <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold (in percent)"/> 6 <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold" help="in percent"/>
7 <param name="covg" type="integer" value="10" min="0" label="Coverage threshold (in reads per strand)"/> 7 <param name="covg" type="integer" value="10" min="0" label="Coverage threshold" help="in reads (per strand)"/>
8 <param name="nofilt" type="boolean" truevalue="-n" falsevalue="" checked="False" label="Do not filter sites or alleles" /> 8 <param name="nofilt" type="boolean" truevalue="-n" falsevalue="" checked="False" label="Do not filter sites or alleles" />
9 <param name="stranded" type="boolean" truevalue="-s" falsevalue="" checked="False" label="Output stranded base counts" /> 9 <param name="stranded" type="boolean" truevalue="-s" falsevalue="" checked="False" label="Output stranded base counts" />
10 <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="True" label="Write header line" /> 10 <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="True" label="Write header line" />
11 <param name="seed" type="text" value="" label="PRNG seed" />
11 </inputs> 12 </inputs>
12 <outputs> 13 <outputs>
13 <data name="output" format="tabular"/> 14 <data name="output" format="tabular"/>
14 </outputs> 15 </outputs>
15 <stdio> 16 <stdio>
16 <exit_code range="1:" err_level="fatal"/> 17 <exit_code range="1:" err_level="fatal"/>
17 <exit_code range=":-1" err_level="fatal"/> 18 <exit_code range=":-1" err_level="fatal"/>
18 </stdio> 19 </stdio>
20
21 <tests>
22 <test>
23 <param name="input" value="tests/artificial.vcf.in" />
24 <param name="freq" value="10" />
25 <param name="covg" value="10" />
26 <param name="seed" value="1" />
27 <output name="output" file="tests/artificial.csv.out" />
28 </test>
29 </tests>
19 30
20 <help> 31 <help>
21 32
22 .. class:: infomark 33 .. class:: infomark
23 34
43 54
44 .. class:: infomark 55 .. class:: infomark
45 56
46 **Output** 57 **Output**
47 58
48 Each row represents one site in one sample. For unstranded output, 12 fields give information about that site:: 59 Each row represents one site in one sample. For **unstranded** output, 13 fields give information about that site::
49 60
50 1. SAMPLE - Sample name (from VCF sample column labels) 61 1. SAMPLE - Sample name (from VCF sample column labels)
51 2. CHR - Chromosome of the site 62 2. CHR - Chromosome of the site
52 3. POS - Chromosomal coordinate of the site 63 3. POS - Chromosomal coordinate of the site
53 4. A - Number of reads supporting an 'A' 64 4. A - Number of reads supporting an 'A'
56 7. T - 'T' reads 67 7. T - 'T' reads
57 8. CVRG - Total (number of reads supporting one of the four bases above) 68 8. CVRG - Total (number of reads supporting one of the four bases above)
58 9. ALLELES - Number of qualifying alleles 69 9. ALLELES - Number of qualifying alleles
59 10. MAJOR - Major allele 70 10. MAJOR - Major allele
60 11. MINOR - Minor allele (2nd most prevalent variant) 71 11. MINOR - Minor allele (2nd most prevalent variant)
61 12. MINOR.FREQ.PERC. - Frequency of minor allele 72 12. MAF - Frequency of minor allele
73 13. BIAS - Strand bias measure
62 74
63 For stranded output, instead of using 4 columns to report read counts per base, 8 are used to report the stranded counts per base:: 75 For stranded output, instead of using 4 columns to report read counts per base, 8 are used to report the stranded counts per base::
64 76
65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 77 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
66 SAMPLE CHR POS +A +C +G +T -A -C -G -T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC. 78 SAMPLE CHR POS +A +C +G +T -A -C -G -T CVRG ALLELES MAJOR MINOR MAF BIAS
67 79
68 **Example** 80 **Example**
69 81
70 Below is a header line, followed by some example data lines. Since the input contained three samples, the data for each site is reported on three consecutive lines. However, if a sample fell below the coverage threshold at that site, the line will be omitted:: 82 Below is a header line, followed by some example data lines. Since the input contained three samples, the data for each site is reported on three consecutive lines. However, if a sample fell below the coverage threshold at that site, the line will be omitted::
71 83
72 #SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC. 84 #SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MAF BIAS
73 BLOOD_1 chr20 99 0 101 1 2 104 1 C T 0.01923 85 BLOOD_1 chr20 99 0 101 1 2 104 1 C T 0.01923 0.33657
74 BLOOD_2 chr20 99 82 44 0 1 127 2 A C 0.34646 86 BLOOD_2 chr20 99 82 44 0 1 127 2 A C 0.34646 0.07823
75 BLOOD_3 chr20 99 0 110 1 0 111 1 C G 0.009 87 BLOOD_3 chr20 99 0 110 1 0 111 1 C G 0.009 1.00909
76 BLOOD_1 chr20 100 3 5 100 0 108 1 G C 0.0463 88 BLOOD_1 chr20 100 3 5 100 0 108 1 G C 0.0463 0.15986
77 BLOOD_3 chr20 100 1 118 11 0 130 0 C G 0.08462 89 BLOOD_3 chr20 100 1 118 11 0 130 0 C G 0.08462 0.04154
78 90
79 ----- 91 -----
80 92
81 .. class:: warningmark 93 .. class:: warningmark
82 94
92 104
93 Strand bias: 105 Strand bias:
94 106
95 The alleles passing the threshold on each strand must match (though not in order), or the allele count will be 0. So a site with A, C, G on the plus strand and A, G on the minus strand will get an allele count of zero, though the (strand-independent) major allele, minor allele, and minor allele frequency will still be reported. If there is a tie for the minor allele, one will be randomly chosen. 107 The alleles passing the threshold on each strand must match (though not in order), or the allele count will be 0. So a site with A, C, G on the plus strand and A, G on the minus strand will get an allele count of zero, though the (strand-independent) major allele, minor allele, and minor allele frequency will still be reported. If there is a tie for the minor allele, one will be randomly chosen.
96 108
109 Additionally, a measure of strand bias is given in the last column. This is calculated using the method of Guo et al., 2012. A value of "." is given when there is no valid result of the calculation due to a zero denominator. This occurs when there are no reads on one of the strands, or when there is no minor allele.
110
97 </help> 111 </help>
98 112
99 </tool> 113 </tool>