comparison calc_fitness.xml @ 12:ca6b343dd70e draft default tip

Uploaded
author kaymccoy
date Sun, 11 Dec 2016 17:08:14 -0500
parents
children
comparison
equal deleted inserted replaced
11:e398b4ccbf7d 12:ca6b343dd70e
1 <tool id="calc_fitness" name="Calculate Fitness">
2 <description>of transposon insertion locations</description>
3 <requirements>
4 <requirement type="package" version="1.64">biopython</requirement>
5 </requirements>
6 <command interpreter="python">
7 calc_fitness.py
8 -ef $ef
9 -el $el
10 -wig $output3
11 -t1 $t1
12 -t2 $t2
13 -ref $ref
14 -out $output
15 -out2 $output2
16 -expansion $expansion
17 -maxweight $maxweight
18 -cutoff $cutoff
19 -cutoff2 $cutoff2
20 -strand $strand
21 #if $normalization.calculations == "yes":
22 -normalize $normalization.genes
23 #end if
24 #if $multiply.choice == "yes":
25 -multiply $multiply.factor
26 #end if
27 #if $reads.uncol == "yes":
28 -uncol 1
29 #end if
30 #if $bottle.all == "yes":
31 -b 1
32 #end if
33 </command>
34 <inputs>
35 <param name="t1" type="data" label="Map files from t1"/>
36 <param name="t2" type="data" label="Map files from t2"/>
37 <param name="ref" type="data" label="GenBank reference genome"/>
38 <conditional name="normalization">
39 <param name="calculations" type="select" label="Normalize fitness calculations?">
40 <option value="no">No</option>
41 <option value="yes">Yes</option>
42 </param>
43 <when value="no">
44 <!-- do nothing -->
45 </when>
46 <when value="yes">
47 <param name="genes" type="data" label="Genes to normalize by" />
48 </when>
49 </conditional>
50 <param name="strand" type="select" label="Use reads from which strands?">
51 <option value="both">both</option>
52 <option value="+">Watson (+)</option>
53 <option value="-">Crick (-)</option>
54 </param>
55 <param name="expansion" type="float" value="250" label="Expansion factor"/>
56 <param name="cutoff" type="float" value="0.0" label="Cutoff1"/>
57 <param name="cutoff2" type="float" value="0.0" label="Cutoff2"/>
58 <param name="ef" type="float" value="0.0" label="Exclude first %"/>
59 <param name="el" type="float" value="0.0" label="Exclude last %"/>
60 <param name="maxweight" type="float" value="75" label="Maximum weight of a transposon gene in normalization calculations"/>
61 <conditional name="multiply">
62 <param name="choice" type="select" label="Multiply fitness scores by a certain value?">
63 <option value="no">No</option>
64 <option value="yes">Yes</option>
65 </param>
66 <when value="no">
67 <!-- do nothing -->
68 </when>
69 <when value="yes">
70 <param name="factor" type="float" value="0.0" label="Multiply by" />
71 </when>
72 </conditional>
73 <conditional name="bottle">
74 <param name="all" type="select" label="Calculate bottleneck value from all genes (rather than only normalization genes)?">
75 <option value="no">No</option>
76 <option value="yes">Yes</option>
77 </param>
78 <when value="no">
79 <!-- do nothing -->
80 </when>
81 <when value="yes">
82 <!-- do nothing -->
83 </when>
84 </conditional>
85 <conditional name="reads">
86 <param name="uncol" type="select" label="Were reads uncollapsed when mapped?">
87 <option value="no">No</option>
88 <option value="yes">Yes</option>
89 </param>
90 <when value="no">
91 <!-- do nothing -->
92 </when>
93 <when value="yes">
94 <!-- do nothing -->
95 </when>
96 </conditional>
97 </inputs>
98 <outputs>
99 <data format="csv" name="output" />
100 <data format="txt" name="output2" />
101 <data format="wig" name="output3" />
102 </outputs>
103 <help>
104
105 **What it does**
106
107 This tool calculates the fitness values of transposon insertion mutations generated by Tn-Seq, by analyzing Illumina sequencing reads from t1 and t2.
108
109 **The options explained**
110
111 Map files from t1: a bowtie mapfile containing the mapped flanking reads from t1
112
113 Map files from t2: a bowtie mapfile containing the mapped flanking reads from t2
114
115 GenBank reference genome: the reference genome of whatever model you're working with, which needs to be in standard genbank format. For more on that format see the genbank website.
116
117 Normalizing fitness calculations: our normalization relies on the fitness scores of insertions within transposon genes, which ought to have a neutral fitness of 1. The file of normalization genes should be formatted so that each line is a single gene loci like "SP_0017"
118
119 Expansion factor: the expansion factor of the bacteria culture you got your reads from - this is something you should measure when you're growing up the bacteria from t1 to t2. Using the default expansion factor of 250 will give you very rough fitness calculations and so it's not recommended.
120
121 Cutoff1: the cutoff for all genes; insertion locations with an average count less than this number will be disregarded, as insertion locations with a low number of reads can have inaccurate fitnesses calculated, for the same reason studies with low sample sizes can be inaccurate.
122
123 Cutoff2: the cutoff for the normalization genes; only has an effect if larger than cutoff
124
125 Exclude first %: insertions in the very beginning of genes sometimes don't actually interfere with their function, and so you can exclude insertions from the first % of a gene from being counted as within those genes. This mostly affects the aggregate calculations downstream.
126
127 Exclude last %: similarly insertions in the very end of genes sometimes don't actually interfere with their function, and so you can exclude insertions from the last % of a gene. Also mostly affects the aggregate calculations downstream.
128
129 Maximum weight of a transposon gene in normalization calculations: in the normalization calculations, fitnesses within transposon genes are weighted according to their number of reads, as fitnesses calculated from more reads tend to be more accurate. However, to keep those fitnesses with huge numbers of reads from vastly outweighing the others, you can limit the max weight.
130
131 Multiplying fitness scores by a certain value: what it says on the lid; you can multiply the normalized fitness scores by a certain value. This can be helpful for genetic interaction screens, where Tn-seq is performed as usual except there's one background knockout all the mutants share. This is because a combination of independent mutations should have a fitness value that's equal to their individual fitness values multipled, but related mutations will deviate from that; to find those deviations you'd multiply all the fitness values from mutants from a normal library by the fitness of the background knockout and compare that to the fitness values found from the knockout library!
132
133 Calculate bottleneck value from all genes (rather than only normalization genes): the bottleneck value is an approximation of what percentage of insertions are randomly lost, either estimated from normalization genes or all genes.
134
135 Were reads uncollapsed when mapped: only select "yes" if reads were never collapsed upstream.
136
137 Output: the output is a csv (comma separated values) file containing the fitness values calculated. Each line besides the header will represent the following information for an insertion location: position, strand, count_1, count_2, ratio, mt_freq_t1, mt_freq_t2, pop_freq_t1, pop_freq_t2, gene, D, W, nW
138
139 Output2: a txt file containing the percent blanks and other info to be used in the Aggregate tool for normalization
140
141 Output3: a wig file that can be used for visualization of the fitness values; each line besides the header will be an insertion location and its (possibly normalized) fitness.
142
143 </help>
144 </tool>