comparison aggregate.xml @ 1:a66c287c1864 draft

Uploaded
author kaymccoy
date Thu, 11 Aug 2016 18:08:35 -0400
parents
children
comparison
equal deleted inserted replaced
0:0890f73e463c 1:a66c287c1864
1 <tool id="aggregate" name="Aggregate">
2 <description>fitness calculations by gene</description>
3 <requirements>
4 <requirement type="package" version="1.64">biopython</requirement>
5 </requirements>
6 <command interpreter="python">
7 aggregate.py
8 #if $mark.certain == "yes":
9 -m $mark.genes
10 #end if
11 #if $weighted.algorithms == "yes":
12 -w 1
13 #end if
14 -x $cutoff
15 -l $weightceiling
16 #if $blank.count == "yes":
17 -b $blank.custom_blanks
18 #end if
19 #if $blank.count == "no":
20 -f $blank.txt_blanks
21 #end if
22 -c $ref
23 -o $output
24 $input
25 #for $a in $additionalcsv
26 ${a.input2}
27 #end for
28 </command>
29 <inputs>
30 <param name="input" type="data" label="csv fitness file"/>
31 <repeat name="additionalcsv" title="Additional csv fitness file(s)">
32 <param name="input2" type="data" label="Select" />
33 </repeat>
34 <param name="ref" type="data" label="GenBank reference genome"/>
35 <conditional name="mark">
36 <param name="certain" type="select" label="Mark certain genes?">
37 <option value="no">No</option>
38 <option value="yes">Yes</option>
39 </param>
40 <when value="no">
41 <!-- do nothing -->
42 </when>
43 <when value="yes">
44 <param name="genes" type="data" label="Genes to mark" />
45 </when>
46 </conditional>
47 <conditional name="weighted">
48 <param name="algorithms" type="select" label="Use weighted algorithms?">
49 <option value="no">No</option>
50 <option value="yes">Yes</option>
51 </param>
52 <when value="-w 1 "/>
53 <when value=""/>
54 </conditional>
55 <param name="weightceiling" type="float" value="50.0" label="Weight ceiling"/>
56 <param name="cutoff" type="float" value="10.0" label="Cutoff"/>
57 <conditional name="blank">
58 <param name="count" type="select" label="Enter custom blank count?">
59 <option value="no">No</option>
60 <option value="yes">Yes</option>
61 </param>
62 <when value="no">
63 <param name="txt_blanks" type="data" label="txt output from Calc_fit or Consol_fit"/>
64 </when>
65 <when value="yes">
66 <param name="custom_blanks" type="float" value="0.0" label="blank count (a number from 0.0 to 1.0)"/>
67 </when>
68 </conditional>
69 </inputs>
70 <outputs>
71 <data name="output" format="csv"/>
72 </outputs>
73 <help>
74
75 **What it does**
76
77 This tool calculates the aggregate fitness values of mutations by gene.
78
79 **The options explained**
80
81 The csv fitness file(s): These are the csv (comma separated values) files containing the fitness values you want to aggregate by gene. Since they should have been produced by the "Calculate Fitness" tool, each line besides the header should represent the following information for an insertion location: position,strand,count_1,count_2,ratio,mt_freq_t1,mt_freq_t2,pop_freq_t1,pop_freq_t2,gene,D,W,nW
82
83 GenBank reference genome: the reference genome of whatever model you're working with, which needs to be in standard genbank format. For more on that format see the genbank website.
84
85 Marking certain genes: If you chose to mark certain genes, those genes will have an "M" under the M column of the resulting aggregate file.
86
87 Using weighted algorithms: Recommended. If you chose to use weighted algorithms, scores will be weighted by the number of reads their insertion location has, as insertions with more reads tend to be more accurate.
88
89 Weight ceiling: This value lets you set a weight ceiling for the weights of fitness values. It's only relevant if you're using weighted algorithms.
90
91 Cutoff: This value lets you ignore the fitness scores of any insertion locations with an average count (the number of counts from t1 and t2 divided by 2) less than it.
92
93 Blanks: This value lets you exclude a % of blank fitness scores (scores with a fitness of 0) from your calculations. It should be entered as a float (e.g. 0.10 would be 10%) if entered by hand, or you can use the blank % calculated from the normalization genes by calc_fit by entering its txt output file
94
95 The name of your output file: self-explanatory. Remember to have it end in ".csv".
96
97 **Additional notes**
98
99 The output file should have each line (besides the header) represent the following information for a particular gene: locus,mean,var,sd,se,gene,Total,Blank,Not Blank,Blank Removed,M
100
101 </help>
102 </tool>