Mercurial > repos > nick > allele_counts
comparison allele-counts.xml @ 3:933a9435939c
Current xml
- includes documentation
author | nick |
---|---|
date | Fri, 31 May 2013 12:34:28 -0400 |
parents | 28c40f4b7d2b |
children | 898eb3daab43 |
comparison
equal
deleted
inserted
replaced
2:318fdf77aa54 | 3:933a9435939c |
---|---|
1 <tool id="allele_counts_1" version="1.0" name="Count alleles"> | 1 <tool id="allele_counts_1" version="1.0" name="Count alleles"> |
2 <description>and minor allele frequencies</description> | 2 <description>and minor allele frequencies</description> |
3 <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header</command> | 3 <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header</command> |
4 <inputs> | 4 <inputs> |
5 <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/> | 5 <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/> |
6 <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold"/> | 6 <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold (in percent)"/> |
7 <param name="covg" type="integer" value="10" min="0" label="Coverage threshold (per strand)"/> | 7 <param name="covg" type="integer" value="10" min="0" label="Coverage threshold (per strand)"/> |
8 <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="False" label="Write header line" /> | 8 <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="True" label="Write header line" /> |
9 </inputs> | 9 </inputs> |
10 <outputs> | 10 <outputs> |
11 <data name="output" format="tabular"/> | 11 <data name="output" format="tabular"/> |
12 </outputs> | 12 </outputs> |
13 <stdio> | 13 <stdio> |
14 <exit_code range="1:" err_level="fatal"/> | 14 <exit_code range="1:" err_level="fatal"/> |
15 <exit_code range=":-1" err_level="fatal"/> | 15 <exit_code range=":-1" err_level="fatal"/> |
16 </stdio> | 16 </stdio> |
17 | 17 |
18 <help> | 18 <help> |
19 This tool parses the output of Naive Variant Detector, counting variants, calculating numbers of alleles, and minor allele frequency. It applies filters based on coverage, strand bias, and minor allele frequency cutoffs. | |
20 | 19 |
21 **Note**: The VCF file from the Naive Variant Detector must include counts *per strand*. | 20 .. class:: warningmark |
21 | |
22 **Note** | |
23 | |
24 This will only process a special type of VCF file. The VCF must have a special genotype field in the sample columns, giving the number of each type of variant. Also, the variant data **must be stranded**. | |
25 | |
26 The Naive Variant Detector tool produces this VCF format, and is the normal upstream tool from this one. | |
27 | |
28 ----- | |
29 | |
30 .. class:: infomark | |
31 | |
32 **What it does** | |
33 | |
34 This tool parses variant counts from a special VCF file (normally the output of the Naive Variant Detector tool). It counts simple (ACGT) variants, calculates numbers of alleles, and calculates minor allele frequency. It applies filters based on coverage, strand bias, and minor allele frequency cutoffs. | |
35 | |
36 ----- | |
37 | |
38 .. class:: infomark | |
39 | |
40 **Output columns** | |
41 | |
42 Each row represents one site in one sample. 12 fields give information about that site:: | |
43 | |
44 1. SAMPLE - Sample names (from VCF sample column labels) | |
45 2. CHR - Chromosome of the site | |
46 3. POS - Chromosomal coordinate of the site | |
47 4. A - Number of reads supporting an 'A' | |
48 5. C - ditto, for 'C' | |
49 6. G - ditto, for 'G' | |
50 7. T - ditto, for 'T' | |
51 8. CVRG - Total (number of reads supporting one of the four bases above) | |
52 9. ALLELES - Number of qualifying alleles | |
53 10. MAJOR - Major allele base | |
54 11. MINOR - Minor allele base (2nd most prevalent variant) | |
55 12. MINOR.FREQ.PERC. - Frequency of minor allele | |
56 | |
57 **Example**:: | |
58 SAMPLE CHR POS A C G T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC. | |
59 BLOOD_3 chr7 10980 11 88 1 0 100 2 C A 0.11 | |
60 | |
61 ----- | |
62 | |
63 .. class:: warningmark | |
64 | |
65 **Site printing and allele tallying requirements** | |
66 | |
67 Each line is printed only when the site is covered by the threshold number of reads **on each strand**. If coverage of either strand is below the threshold, the line (sample + site combination) is omitted. | |
68 | |
69 **N.B.**: This means the total coverage for each printed site will be at least twice the number you give in the "coverage threshold" option. | |
70 | |
71 Also, reads supporting a variant outside the canonical 4 nucleotides will not count towards the coverage requirement. For instance, a site covered by 150 reads on each strand, the majority of which support an indel variant, will not be printed. | |
72 | |
73 Alleles are only counted in the column 9 tally if they meet or exceed the minor allele frequency threshold. In addition, the alleles passing the threshold on each strand have to match (though not in order). Otherwise, the allele count will be 0. The reported minor allele and minor allele frequency, though, will always be reported. | |
74 | |
75 However, if there is a tie for the minor allele (between the 2nd and 3rd most common alleles), the minor allele will be reporated as 'N', and the frequency as 0. | |
76 | |
77 ----- | |
78 | |
79 .. class:: infomark | |
80 | |
81 **Additional notes** | |
82 | |
83 | |
84 | |
22 </help> | 85 </help> |
23 | 86 |
24 </tool> | 87 </tool> |