comparison codeml.xml @ 0:961a712f9743 draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/codeml commit 5e46bc6be71912be9b1982b3c2f0d30a36d9b3a8
author iuc
date Tue, 29 Aug 2017 19:12:01 -0400
parents
children ba71e26d5bdc
comparison
equal deleted inserted replaced
-1:000000000000 0:961a712f9743
1 <tool name="codeML" id="codeml" version="1.0">
2 <description>
3 Detects positive selection (paml package)
4 </description>
5
6 <macros>
7 <import>macros.xml</import>
8 </macros>
9
10 <requirements>
11 <requirement type="package" version="4.9">paml</requirement>
12 </requirements>
13
14 <version_command><![CDATA[ codeml /dev/null 2>&1 | tail -1 ]]></version_command>
15
16 <command><![CDATA[
17
18 codeml '$codeml_ctl'
19 &&
20 mv '$codeml_ctl' '$ctl'
21
22 ]]></command>
23
24 <configfiles>
25 <configfile name="codeml_ctl"><![CDATA[
26 seqfile = $concat_nuc * sequence data file name
27 outfile = run_codeml * main result file name
28 treefile = $tree * tree structure file name
29 noisy = 9 * 0,1,2,3,9: how much rubbish on the screen
30 verbose = $adv.verbose * 1: detailed output, 0: concise output
31 runmode = $adv.runmode * 0: user tree; 1: semi-automatic; 2: automatic
32 * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise
33 seqtype = $adv.seqtype * 1:codons; 2:AAs; 3:codons-->AAs
34 CodonFreq = $adv.CodonFreq * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
35 clock = $adv.clock * 0:no clock, 1:clock; 2:local clock
36 aaDist = $adv.aaDist * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
37 * 7:AAClasses
38 aaRatefile = $adv.aaRateFile * only used for aa seqs with model=empirical(_F)
39 * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own
40 model = $compat_model.brmodel
41 * models for codons:
42 * 0:one, 1:b, 2:2 or more dN/dS ratios for branches
43 * models for AAs or codon-translated AAs:
44 * 0:poisson, 1:proportional,2:Empirical,3:Empirical+F
45 * 6:FromCodon, 8:REVaa_0, 9:REVaa(nr=189)
46 NSsites = $compat_model.NSsites * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;
47 * 5:gamma;6:2gamma;7:beta;8:beta&w;9:beta&gamma;
48 * 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;
49 * 13:3normal>0
50 icode = $adv.icode * 0:universal code; 1:mammalian mt; 2-11:see below
51 Mgene = $adv.Mgene * 0:rates, 1:separate;
52 fix_kappa = $adv.fix_kappa * 1: kappa fixed, 0: kappa to be estimated
53 kappa = $adv.kappa * initial or fixed kappa
54 fix_omega = $adv.fix_omega * 1: omega or omega_1 fixed, 0: estimate
55 omega = $adv.omega * initial or fixed omega, for codons or codon-based AAs
56 fix_alpha = $adv.fix_alpha * 0: estimate gamma shape parameter; 1: fix it at alpha
57 alpha = $adv.alpha * initial or fixed alpha, 0:infinity (constant rate)
58 Malpha = $adv.Malpha * 1: different alphas for genes, 0 : one alpha
59 ncatG = $adv.ncatG * # of categories in dG of NSsites models
60 fix_rho = $adv.fix_rho * 0: estimate rho; 1: fix it at rho
61 rho = $adv.rho * initial or fixed rho, 0:no correlation
62 getSE = $adv.getSE * 0: don't want them, 1: want S.E.s of estimates
63 RateAncestor = $adv.RateAncestor * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)
64 Small_Diff = $adv.Small_Diff
65 cleandata = $adv.cleandata * remove sites with ambiguity data (1:yes, 0:no)?
66 fix_blength = $adv.fix_blength * 0: ignore, -1: random, 1: initial, 2: fixed
67 method = $adv.method * 0: simultaneous; 1: one branch at a time
68
69 ]]></configfile>
70 </configfiles>
71
72 <inputs>
73
74 <param name="concat_nuc" type="data" format="fasta" label="Sequences file" help="The fasta file with the sequences to be analyzed" />
75 <param name="tree" type="data" format="nhx" label="tree file" help="Tree file in Newick format" />
76
77 <conditional name="compat_model" >
78 <param argument="brmodel" type="select" label="Branch model ; for tree file editing in model 2 and 3, see paml manual (chap.3)" >
79 <option value="0" selected="true">'0' : one dN/dS ratio for all branches (e.g. basic model if NSsites=0)</option>
80 <option value="1">'1' : one dN/dS ratio for each branch ("free-ratio model") ; its use is discouraged</option>
81 <option value="2">'2' : arbitrary number of ratios ; implies to manually edit your tree file</option>
82 <option value="3">'3' : clade-model ; implies to manually edit your tree file</option>
83 </param>
84
85 <when value="0" >
86 <expand macro="FSsites_br0" />
87 </when>
88 <when value="1">
89 <expand macro="FSsites_br1" />
90 </when>
91 <when value="2" >
92 <expand macro="FSsites_br2_and_3" />
93 </when>
94 <when value="3" >
95 <expand macro="FSsites_br2_and_3" />
96 </when>
97 </conditional>
98
99 <!-- advanced parameters -->
100 <section name="adv" title="Advanced Options" expanded="False" >
101 <param argument="verbose" type="select" label="Set the level of details in the log file">
102 <option value="0" selected="true">0 : concise output</option>
103 <option value="1">1 : detailed output</option>
104 </param>
105
106 <param argument="runmode" type="select" label="Tree analysis mode" >
107 <option value="0" selected="true">0 : user tree</option>
108 <option value="1">1 : heuristic tree search starting from a multifurcating tree from the tree structure file</option>
109 <option value="2">2 : heuristic tree search starting from the tree file</option>
110 <option value="3">3 : StepwiseAddition</option>
111 <option value="4">4 : PerturbationNNI with the starting tree obtained by a parsimony algorithm</option>
112 <option value="5">5 : PerturbationNNI with the starting tree read from the tree structure file</option>
113 <option value="-2">-2 : ML estimation of dS and dN in pairwise comparisons of protein-coding sequences</option>
114 </param>
115
116 <param argument="seqtype" type="select" label="Sequences format in the fasta file" >
117 <option value="1" selected="true">1 : codons</option>
118 <option value="2">2 : Amino acids (only compatible with FSsites=0) </option>
119 <option value="3">3 : codons--&gt;amino acids (only compatible with FSsites=0)</option>
120 </param>
121
122 <param argument="CodonFreq" type="select" label="Equilibrium codon frequencies in codon substitution model">
123 <option value="0">0 : 1/61 each:</option>
124 <option value="1" selected="true">1 : FIX4</option>
125 <option value="2">2 : codon table</option>
126 </param>
127
128 <param argument="clock" type="select" label="Specifies models concerning rate constancy or variation among lineages" >
129 <option value="0" selected="true">0 : no clock ; An unrooted tree should be used under this model</option>
130 <option value="1">1 : clock</option>
131 <option value="2">2 : local clock (needed : branch labels in the tree)</option>
132 </param>
133
134 <param argument="aaDist" type="select" label="Amino acid distances" >
135 <option value="0" selected="true">0 : equal (warning : the only one compatible with NSsites and seqtype=codons</option>
136 <option value="+">+ : geometric</option>
137 <option value="-">- : linear</option>
138 <option value="1">1 : G1974</option>
139 <option value="2">2 : Miyata</option>
140 <option value="3">3 : c</option>
141 <option value="4">4 : p</option>
142 <option value="5">5 : v</option>
143 <option value="6">6 : a</option>
144 <option value="7">7 : AAClasses</option>
145 </param>
146
147 <param argument="aaRateFile" type="select" label="Amino acide substitution rate matrix" >
148 <option value="wag.dat" selected="true">wag.dat</option>
149 <option value="dayhoff.dat">dayhoff.dat</option>
150 <option value="jones.dat">jones.dat</option>
151 <option value="mtmam.dat">mtmam.dat</option>
152 </param>
153
154 <param argument="icode" type="select" label="Icode : specifies the genetic code" >
155 <option value="0" selected="true">0 : universal code</option>
156 <option value="1">1 : mammalian mt</option>
157 <option value="3">3 : mold mt</option>
158 <option value="4">4 : invertebrate mt</option>
159 <option value="5">5 : ciliate nuclear code mt</option>
160 <option value="6">6 : echinoderm mt</option>
161 <option value="7">7 : euplotid mt</option>
162 <option value="8">8 : alternative yeast nuclear</option>
163 <option value="9">9 : scidian mt</option>
164 <option value="10">10 : blepharisma nuclear</option>
165 <option value="11">11 : Yang's regularized code</option>
166 </param>
167
168 <param argument="Mgene" type="select" value="0" label="Multiple genes"
169 help="Used in combination with option G in the sequence data file, for combined analysis of data from multiple genes or multiple site partitions" >
170 <option value="0">0 : complete homogeneity among genes </option>
171 <option value="1">1 : equivalent to a separate analysis</option>
172 <option value="2">2 : different frequency parameters for different genes but the same rate ratio parameters</option>
173 <option value="3">3 : different rate ratio parameters and the same frequency parameters</option>
174 <option value="4">4 : both different rate ratio parameters and different frequency parameters for different genes </option>
175 </param>
176
177 <param argument="fix_kappa" type="select" label="Specifies wether kappa in K80, F84, or HKY85 is fixed or estimated">
178 <option value="0" selected="true">0 : estimated</option>
179 <option value="1">1 : fixed (the next parameter below)</option>
180 </param>
181
182 <param argument="kappa" type="float" value="2" label="Initial or fixed value of kappa" help="kappa refers to the transition/tranversion rate ratio"/>
183
184 <param argument="fix_omega" type="select" label="Fixed or estimated omega" >
185 <option value="0" selected="true">0 : estimated</option>
186 <option value="1">1 : fixed</option>
187 </param>
188
189 <param argument="omega" type="float" value="0.2" label="Initial or fixed omega (according to your choice for fix_omega), for codons or codon-based AAs"/>
190
191 <param argument="fix_alpha" type="select" label="Estimated or fixed gamma shape parameter" >
192 <option value="0">0 : estimate gamma shape parameter. Not recommended</option>
193 <option value="1" selected="true">1 : fix it at alpha (the next parameter below)</option>
194 </param>
195
196 <param argument="alpha" type="float" value="0" label="Initial or fixed value of alpha (gamma shape parameter)"
197 help="0: constant rate. fix_alpha !=1 and alpha !=0 are not compatible with NSsites !=0"/>
198
199 <param argument="Malpha" type="select" label="Different alphas for genes" >
200 <option value="0">0 : one gamma distribution will be applied across all sites (one alpha)</option>
201 <option value="1">1 : different gamma distribution is used for each gene or codon position (different alphas for genes)</option>
202 </param>
203
204 <param argument="ncatG" type="integer" value="3" label="# of categories in dG of NSsites models" />
205
206 <param argument="fix_rho" type="select" label="Independence or correlation of rates at adjacent sites" >
207 <option value="0">0 : estimate rho</option>
208 <option value="1" selected="true">1 : fix it at rho (the next parameter below)</option>
209 </param>
210
211 <param argument="rho" type="float" value="0" label="Initial or fixed rho" help="fix_rho=1 and rho=0 : independent rates" />
212
213 <param argument="getSE" type="select" label="Estimates of the standard errors of estimated parameters." >
214 <option value="0" selected="true">0 : don't want them</option>
215 <option value="1">1 : want S.E.s of estimates</option>
216 </param>
217
218 <param argument="RateAncestor" type="select" label="RateAncestor ; set 1 to force the program to do two additional analyses" >
219 <option value="0" selected="true">0 : usually use 0</option>
220 <option value="1">1 : model of variable rates across site + empirical Bayesian reconstruction of ancestral sequences</option>
221 </param>
222
223 <param argument="Small_Diff" type="float" value=".5e-6" label="Value used in the difference approximation of derivatives"/>
224
225 <param argument="cleandata" type="select" label="Remove sites with ambiguity data" help="Warning : choosing 'yes' may remove a lot (possibly all) of data.">
226 <option value="0" selected="true">0 : no (don't remove ambiguous data)</option>
227 <option value="1">1 : yes (remove ambiguous data)</option>
228 </param>
229
230 <param argument="fix_blength" type="select" label="branch length dealing" >
231 <option value="0" selected="true">0 : ignore branch lengths</option>
232 <option value="-1">-1 : start from random starting points</option>
233 <option value="1">1 : initial values as written in the tree file</option>
234 <option value="2">2 : fixed at values in the tree file</option>
235 </param>
236
237 <param argument="method" type="select" label="Controls the iteration algorithm for estimating branch lengths" >
238 <option value="0" selected="true">0 : simultaneous (old paml algorithm)</option>
239 <option value="1">1 : one branch at a time (newly implemented in paml ; does not work with clock=1,2,3)</option>
240 </param>
241 </section>
242 </inputs>
243
244 <outputs>
245 <data format="txt" name="ctl" label="${tool.name} on ${on_string}: codeml.ctl" />
246 <data format="txt" name="2ngdn" from_work_dir="2NG.dN" label="${tool.name} on ${on_string}: 2NG.dN" >
247 <filter>adv['seqtype']=="1"</filter>
248 </data>
249 <data format="txt" name="2ngds" from_work_dir="2NG.dS" label="${tool.name} on ${on_string}: 2NG.dS" >
250 <filter>adv['seqtype']=="1"</filter>
251 </data>
252 <data format="txt" name="2ngt" from_work_dir="2NG.t" label="${tool.name} on ${on_string}: 2NG.t" >
253 <filter>adv['seqtype']=="1"</filter>
254 </data>
255 <data format="txt" name="lnf" from_work_dir="lnf" label="${tool.name} on ${on_string}: lnf"/>
256 <data format="txt" name="rst" from_work_dir="rst" label="${tool.name} on ${on_string}: rst"/>
257 <data format="txt" name="rst1" from_work_dir="rst1" label="${tool.name} on ${on_string}: rst1"/>
258 <data format="txt" name="rub" from_work_dir="rub" label="${tool.name} on ${on_string}: rub"/>
259 <data format="txt" name="run" from_work_dir="run_codeml" label="${tool.name} on ${on_string}: run_codeml"/>
260 <data format="txt" name="4fold" from_work_dir="4fold.nuc" label="${tool.name} on ${on_string}: 4fold.nuc">
261 <filter>adv['verbose']=="1"</filter>
262 </data>
263 </outputs>
264
265 <tests>
266 <test>
267 <conditional name="compat_model" >
268 <param name="brmodel" value="0" />
269 <param name="NSsites" value="0" />
270 </conditional>
271 <param name="adv.fix_omega" value="0" />
272 <param name="adv.omega" value="0.2" />
273 <param name="RateAncestor" value="1" />
274 <param name="concat_nuc" ftype="fasta" value="concat.fasta" />
275 <param name="tree" ftype="txt" value="RAxML_bestTree" />
276 <output name="2ngdn" value="1_2ngdn" />
277 <output name="2ngds" value="1_2ngds" />
278 <output name="2ngt" value="1_2ngt" />
279 <output name="run" value="1_run_codeml" lines_diff="20"/>
280 <output name="ctl" value="1_codeml.ctl" lines_diff="4" />
281 </test>
282 <test>
283 <conditional name="compat_model" >
284 <param name="brmodel" value="2" />
285 <param name="NSsites" value="0" />
286 </conditional>
287 <param name="adv.fix_omega" value="0" />
288 <param name="adv.omega" value="0.2" />
289 <param name="RateAncestor" value="1" />
290 <param name="concat_nuc" ftype="fasta" value="concat.fasta" />
291 <param name="tree" ftype="txt" value="tree_model2" />
292 <output name="2ngdn" value="2_2ngdn" />
293 <output name="2ngds" value="2_2ngds" />
294 <output name="2ngt" value="2_2ngt" />
295 <output name="run" value="2_run_codeml" lines_diff="20" />
296 <output name="ctl" value="2_codeml.ctl" lines_diff="4" />
297 </test>
298 <test>
299 <conditional name="compat_model" >
300 <param name="brmodel" value="3" />
301 <param name="NSsites" value="2" />
302 </conditional>
303 <param name="adv.fix_omega" value="0" />
304 <param name="adv.omega" value="0.2" />
305 <param name="RateAncestor" value="1" />
306 <param name="concat_nuc" ftype="fasta" value="concat.fasta" />
307 <param name="tree" ftype="txt" value="tree_model3" />
308 <output name="2ngdn" value="3_2ngdn" />
309 <output name="2ngds" value="3_2ngds" />
310 <output name="2ngt" value="3_2ngt" />
311 <output name="run" value="3_run_codeml" lines_diff="20"/>
312 <output name="ctl" value="3_codeml.ctl" lines_diff="4" />
313 </test>
314 </tests>
315
316 <help><![CDATA[
317
318 .. class:: infomark
319
320 **Galaxy integration** Victor Mataigne and ABIMS TEAM.
321
322 Contact support.abims@sb-roscoff.fr for any questions or concerns about the Galaxy implementation of this tool.
323
324 ----------
325
326 **CompCodeML (from paml package)**
327
328 A few help is detailed below ; full and detailed codeml readme can be found on the paml website_
329
330 .. _website: http://abacus.gene.ucl.ac.uk/software/paml.html
331
332 .. class:: warningmark
333
334 Due to their high number, some parameters incompatibility can remain.
335
336 This Galaxy implementation :
337 - handles incompatibilities between branch and sites models (the tool CANNOT be run with incompatible models).
338 - warns the user in a help section when an advanced parameter has known incompatibilities (the tool CAN be run, but the output files will be empty).
339
340 We recommand to have a look at the full paml manual before looking at the advanced parameters, in order to spot parameters incompatibilities and to know what each model does. If you choose by mistake incompatible parameters, the output files will be empty, except the log file ("run_codeml" output) which will normally explicit the error.
341
342 .. class:: infomark
343
344 Known incompatibilities:
345 - 'seqtype' = 3 : only compatible with 'FSsites' = 0.
346 - 'clock' = 2 : needs branch labels in the tree.
347 - fix_alpha !=1 combined with alpha !=0 are not compatible with NSsites !=0
348 - 'aaDist' = 0 is the only one compatible with 'NSsites' different than 0 and 'seqtype' = 1.
349 - 'method' = 1 : does not work with 'clock' different than 0.
350
351 ----------
352
353 **Description**
354
355 .. class:: infomark
356
357 codeML finds positive selection within branches or codons within a tree and a set of sequences.
358
359 ----------
360
361 **Input files**
362
363 - a treefile in Newick format (with or without branch lengths).
364 - a fasta file with sequences from the species of the tree file (one header/sequence per species) and run codeml (from the paml suite).
365
366 ----------
367
368 **Parameters**
369
370 Several models are available.
371 - branch models ("model" parameter).
372 - sites models ("NSsites" parameter, model is left at 0).
373 - branch-sites models (when model = 2 NSsites=2,3).
374 - Clade models (when model=3 NSsites=2,3).
375 Basically, this tool write a configfile called codeml.ctl with the specified parameters and then launches codeml.
376
377 .. class:: infomark
378
379 Branch models allow the omega ratio to vary among branches in the phylogeny and are useful for detecting positive selection acting on particular lineages. Sites models allow the omega ratio to vary among sites (codons or amino acids).
380
381 Two pairs of models appear to be particularly useful, forming two likelihood ratio tests of positive selection. The first compares M1a ('NearlyNeutral', NSsites=1) and M2a ('PositiveSelection', NSsites=2), while the second compares M7 ('beta', NSsites=7) and M8 ('beta&ω', NSsites=8).
382
383 **Other examples of model**
384
385 How to run the branch-site models (A &amp; B in Yang &amp; Nielsen 2002 MBE) ?
386 The options are :
387 Model A: (model=2, NSsites=2).
388 Model B: (model=2, NSsites=3).
389
390 How to run the M0 (one-ratio) model :
391 model = 0, NSsites= = 0.
392
393 ----------
394
395 **Advanced Parameters**
396
397 .. class:: infomark
398
399 See paml complete manual and FAQ on paml website_
400
401 .. _website: http://abacus.gene.ucl.ac.uk/software/paml.html
402
403 **Details of some parameters :**
404
405 - 'kappa' denotes the transition/transversion rate ratio.
406 - 'fix_kappa' specifies whether kappa in K80, F84, or HKY85 is given at a fixed value or is to be estimated by iteration from the data.
407 -> If fix_kappa = 1 (fixed), the value of kappa is the given value
408 -> If fix_kappa = 0 (estimated) the value of kappa is used as the initial estimate for iteration.
409
410 - 'alpha' refers to the shape parameter alpha of the gamma distribution for variable substitution rates across sites (Yang 1994a).
411 - 'fix_alpha' works in a similar way that fix_kappa.
412 -> The model of a single rate for all sites is specified as fix_alpha = 1 and alpha = 0 (0 means infinity)
413 -> The (discrete-) gamma model is specified by a positive value for alpha, and 'ncatG' is then the number of categories for the discrete-gamma model. Values such as 5, 4, 8, or 10 are reasonable.
414
415 - fix_rho and rho work in a similar way and concern independence or correlation of rates at adjacent sites, where rho is the correlation parameter of the auto-discrete-gamma model (Yang 1995).
416 -> The model of independent rates for sites is specified as fix_rho = 1 and rho = 0; choosing alpha = 0 further means a constant rate for all sites.
417 -> The auto-discrete-gamma model is specified by positive values for both alpha and rho.
418 -> The model of a constant rate for sites is a special case of the (discrete) gamma model with alpha = 0 (means infinity).
419 -> The model of independent rates for sites is a special case of the auto-discrete-gamma model with rho = 0.
420
421 ----------
422
423 **Output files**
424
425 - codeml.ctl : a copy of the control file (list of all the parameters used for the codeml run).
426 - run_codeml : main result file name.
427 - The 2NG.dN and 2NG.dS files are the Nei an Gojobori (1986) dN and dS values;
428 - lnf, rst and rst1: Supplemental results.
429 - rub : records of the iteration progress (i.e. the minimization of the negative log-likelihood).
430
431 ----------
432
433 **How to edit manually the tree file : Branch or node labels**
434
435 Some models implemented in codeml allow several groups of branches on the tree, which are assigned different parameters of interest.
436
437 - For example, in the local clock models (clock = 2 or 3), you can have, say, 3 branch rate groups, with low, medium, and high rates respectively.
438
439 - Also the branch-specific codon models (model = 2 or 3 or codonml) allow different branch groups to have different ωs, leading to so called “two-ratios” and “three-ratios” models.
440
441 - All those models require branches or nodes in the tree to be labeled. Branch labels are specified in the same way as branch lengths except that the symbol “#” is used rather than “:”. The branch labels are consecutive integers starting from 0, which is the default and does not have to be specified.
442
443 In ((Hsa_Human, Hla_gibbon) #1, ((Cgu/Can_colobus, Pne_langur), Mmu_rhesus), (Ssc_squirrelM, Cja_marmoset)); :
444 The internal branch ancestral to human and gibbon has the ratio ω1, while all other branches (with the default label #0) have the background ratio ω0.
445
446 The following trees are equivalent :
447 ((rabbit, rat) $1, human), goat_cow, marsupial);
448 (((rabbit #1, rat #1) #1, human), goat_cow, marsupial);
449
450 $ is the symbol for clade labels.
451
452 Rules concerning nested clade labels : The symbol # takes precedence over the symbol $, and clade labels close to the tips take precedence over clade labels for ancestral nodes close to the root.
453
454 In the tree ((((rabbit, rat) $2, human #3), goat_cow) $1, marsupial); :
455 $1 is first applied to the whole clade of placental mammals (except for the human lineage), and then $2 is applied to the rabbit-rat clade.
456 Equivalent tree with only '#' :
457 ((((rabbit #2, rat #2) #2, human #3) #1, goat_cow #1) #1, marsupial);
458
459
460 ]]></help>
461
462 <citations>
463 <citation type="doi">10.1093/molbev/msm088</citation>
464 </citations>
465 </tool>