diff allele-counts.xml @ 6:df3b28364cd2

allele-counts.{py,xml}: Add strand bias, documentation updates.
author nicksto <nmapsy@gmail.com>
date Wed, 09 Dec 2015 11:20:51 -0500
parents 31361191d2d2
children a72277535a2c
line wrap: on
line diff
--- a/allele-counts.xml	Thu Sep 12 11:34:23 2013 -0400
+++ b/allele-counts.xml	Wed Dec 09 11:20:51 2015 -0500
@@ -1,13 +1,14 @@
-<tool id="allele_counts_1" version="1.1" name="Variant Annotator">
+<tool id="allele_counts_1" version="1.2" name="Variant Annotator">
   <description> process variant counts</description>
-  <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header $stranded $nofilt</command>
+  <command interpreter="python">allele-counts.py -i $input -o $output -f $freq -c $covg $header $stranded $nofilt -r $seed</command>
   <inputs>
     <param name="input" type="data" format="vcf" label="Input variants from Naive Variants Detector"/>
-    <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold (in percent)"/>
-    <param name="covg" type="integer" value="10" min="0" label="Coverage threshold (in reads per strand)"/>
+    <param name="freq" type="float" value="1.0" min="0" max="100" label="Minor allele frequency threshold" help="in percent"/>
+    <param name="covg" type="integer" value="10" min="0" label="Coverage threshold" help="in reads (per strand)"/>
     <param name="nofilt" type="boolean" truevalue="-n" falsevalue="" checked="False" label="Do not filter sites or alleles" />
     <param name="stranded" type="boolean" truevalue="-s" falsevalue="" checked="False" label="Output stranded base counts" />
     <param name="header" type="boolean" truevalue="-H" falsevalue="" checked="True" label="Write header line" />
+    <param name="seed" type="text" value="" label="PRNG seed" />
   </inputs>
   <outputs>
     <data name="output" format="tabular"/>
@@ -17,6 +18,16 @@
     <exit_code range=":-1" err_level="fatal"/>
   </stdio>
 
+  <tests>
+    <test>
+      <param name="input" value="tests/artificial.vcf.in" />
+      <param name="freq" value="10" />
+      <param name="covg" value="10" />
+      <param name="seed" value="1" />
+      <output name="output" file="tests/artificial.csv.out" />
+    </test>
+  </tests>
+
   <help>
 
 .. class:: infomark
@@ -45,7 +56,7 @@
 
 **Output**
 
-Each row represents one site in one sample. For unstranded output, 12 fields give information about that site::
+Each row represents one site in one sample. For **unstranded** output, 13 fields give information about that site::
 
     1.  SAMPLE  - Sample name (from VCF sample column labels)
     2.  CHR     - Chromosome of the site
@@ -58,23 +69,24 @@
     9.  ALLELES - Number of qualifying alleles
     10. MAJOR   - Major allele
     11. MINOR   - Minor allele (2nd most prevalent variant)
-    12. MINOR.FREQ.PERC. - Frequency of minor allele
+    12. MAF     - Frequency of minor allele
+    13. BIAS    - Strand bias measure
 
 For stranded output, instead of using 4 columns to report read counts per base, 8 are used to report the stranded counts per base::
 
-    1       2   3   4  5  6  7  8  9 10 11  12    13     14    15         16
-    SAMPLE CHR POS +A +C +G +T -A -C -G -T CVRG ALLELES MAJOR MINOR MINOR.FREQ.PERC.
+    1       2   3   4  5  6  7  8  9 10 11  12    13     14    15   16   17
+    SAMPLE CHR POS +A +C +G +T -A -C -G -T CVRG ALLELES MAJOR MINOR MAF BIAS
 
 **Example**
 
 Below is a header line, followed by some example data lines. Since the input contained three samples, the data for each site is reported on three consecutive lines. However, if a sample fell below the coverage threshold at that site, the line will be omitted::
 
-    #SAMPLE  CHR    POS  A   C    G    T  CVRG  ALLELES  MAJOR  MINOR  MINOR.FREQ.PERC.
-    BLOOD_1  chr20  99   0   101  1    2  104   1        C      T      0.01923
-    BLOOD_2  chr20  99   82  44   0    1  127   2        A      C      0.34646
-    BLOOD_3  chr20  99   0   110  1    0  111   1        C      G      0.009
-    BLOOD_1  chr20  100  3   5    100  0  108   1        G      C      0.0463
-    BLOOD_3  chr20  100  1   118  11   0  130   0        C      G      0.08462
+    #SAMPLE  CHR    POS  A   C    G    T  CVRG  ALLELES  MAJOR  MINOR  MAF      BIAS
+    BLOOD_1  chr20  99   0   101  1    2  104   1        C      T      0.01923  0.33657
+    BLOOD_2  chr20  99   82  44   0    1  127   2        A      C      0.34646  0.07823
+    BLOOD_3  chr20  99   0   110  1    0  111   1        C      G      0.009    1.00909
+    BLOOD_1  chr20  100  3   5    100  0  108   1        G      C      0.0463   0.15986
+    BLOOD_3  chr20  100  1   118  11   0  130   0        C      G      0.08462  0.04154
 
 -----
 
@@ -94,6 +106,8 @@
 
 The alleles passing the threshold on each strand must match (though not in order), or the allele count will be 0. So a site with A, C, G on the plus strand and A, G on the minus strand will get an allele count of zero, though the (strand-independent) major allele, minor allele, and minor allele frequency will still be reported. If there is a tie for the minor allele, one will be randomly chosen.
 
+Additionally, a measure of strand bias is given in the last column. This is calculated using the method of Guo et al., 2012. A value of "." is given when there is no valid result of the calculation due to a zero denominator. This occurs when there are no reads on one of the strands, or when there is no minor allele.
+
   </help>
 
-</tool>
\ No newline at end of file
+</tool>