annotate poppunk_cluster.xml @ 0:d3c2de4d003a draft

Uploaded
author johnlees
date Wed, 15 May 2019 14:44:11 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
1 <tool id="poppunk_cluster" name="PopPUNK (cluster)" version="1.1.6">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
2 <description>Cluster bacterial genomes</description>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
3
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
4 <requirements>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
5 <requirement type="package" version="1.1.6">poppunk</requirement>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
6 </requirements>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
7
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
8 <version_command><![CDATA[
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
9 poppunk --version
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
10 ]]></version_command>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
11
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
12 <command detect_errors="exit_code"><![CDATA[
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
13 ##Set up input files
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
14 ##echo "$input_assemblies" | tr ',' '\n' > r_files.txt
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
15 #for $input in $input_assemblies
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
16 ln -s $input $input.element_identifier && echo $input.element_identifier >> r_files.txt;
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
17 #end for
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
18
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
19 ## command line 1 (create db)
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
20 echo "
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
21 poppunk
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
22 --create-db
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
23 --r-files r_files.txt
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
24 --output poppunk_db
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
25 --threads \${GALAXY_SLOTS:-1}
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
26 --min-k $min_k
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
27 --max-k $max_k
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
28 --k-step $k_step
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
29 --sketch-size $sketch_size
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
30 --max-a-dist $max_a_dist
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
31 #if $ignore_length
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
32 --ignore_length
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
33 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
34 --no-stream
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
35 " > poppunk_1.sh
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
36
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
37 && sh poppunk_1.sh
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
38
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
39 ## command line 2 (fit model)
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
40 && echo "
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
41 poppunk
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
42 --fit-model
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
43 --distances poppunk_db/poppunk_db.dists
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
44 --ref-db poppunk_db
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
45 --output poppunk_db
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
46
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
47 ## mode
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
48 #if str( $model.model_mode ) == "gmm":
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
49 --K ${model.K}
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
50 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
51 #if str( $model.model_mode ) == "dbscan":
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
52 --dbscan
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
53 --D $model.D
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
54 --min-cluster-prop $model.min_cluster_prop
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
55 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
56
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
57 #if not $no_full_db or $refine.refine_model
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
58 --full-db
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
59 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
60
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
61 #if not $refine.refine_model:
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
62 #if $external_clusters
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
63 --external-clustering $external_clusters
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
64 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
65
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
66 ## viz
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
67 #if $cytoscape:
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
68 --cytoscape
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
69 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
70 #if $viz.microreact:
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
71 --microreact
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
72 --rapidnj rapidnj
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
73 --perplexity $viz.perplexity
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
74 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
75 #if ($cytoscape or $viz.microreact) and $info_csv:
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
76 --info-csv $info_csv
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
77 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
78 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
79
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
80 " > poppunk_2.sh
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
81
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
82 && sh poppunk_2.sh
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
83
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
84 ## command line 3 (refine)
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
85 #if $refine.refine_model:
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
86 && echo "
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
87 poppunk
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
88 --refine-model
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
89 --distances poppunk_db/poppunk_db.dists
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
90 --ref-db poppunk_db
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
91 --output poppunk_db
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
92 --threads \${GALAXY_SLOTS:-1}
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
93 --pos-shift $refine.pos_shift
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
94 --neg-shift $refine.neg_shift
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
95
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
96 #if not $no_full_db
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
97 --full-db
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
98 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
99
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
100 #if $external_clusters
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
101 --external-clustering $external_clusters
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
102 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
103
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
104 ## viz
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
105 #if $cytoscape:
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
106 --cytoscape
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
107 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
108 #if $viz.microreact:
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
109 --microreact
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
110 --rapidnj rapidnj
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
111 --perplexity $viz.perplexity
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
112 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
113 #if ($cytoscape or $viz.microreact) and $info_csv:
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
114 --info-csv $info_csv
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
115 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
116
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
117 " > poppunk_3.sh
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
118
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
119 && sh poppunk_3.sh;
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
120 #end if
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
121
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
122 ]]></command>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
123
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
124 <inputs>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
125
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
126 <!-- input files -->
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
127 <param name="input_assemblies" type="data" format="fasta" multiple="true" label="FASTA datasets (assemblies)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
128 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
129
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
130 <!-- model type -->
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
131 <conditional name="model">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
132 <param name="model_mode" type="select" label="Choose a model to use" help="See documentation or description below for advice, if default does not work." display="radio">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
133 <option value="gmm" selected="true">GMM</option>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
134 <option value="dbscan">DBSCAN</option>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
135 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
136 <!-- model options -->
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
137 <when value="gmm">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
138 <param name="K" type="integer" value="3" min="2" max="10" label="Number of mixture components">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
139 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
140 </when>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
141 <when value="dbscan">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
142 <param name="D" type="integer" value="100" min="2" max="500" label="Maximum number of spatial clusters">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
143 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
144 <param name="min_cluster_prop" type="float" value="0.0001" min="0.00001" max="0.1" label="Minimum proportion of points in a cluster">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
145 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
146 </when>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
147 </conditional>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
148
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
149 <!-- refine model options -->
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
150 <conditional name="refine">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
151 <param name="refine_model" type="boolean" checked="false" label="Run model refinement after model fit">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
152 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
153 <when value="true">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
154 <param name="pos_shift" type="float" value="0.2" min="0.0" max="1.0" label="Maximum amount to move the boundary away from origin">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
155 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
156 <param name="neg_shift" type="float" value="0.4" min="0.0" max="1.0" label="Minimum amount to move the boundary away from origin">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
157 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
158 </when>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
159 <when value='false' />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
160 </conditional>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
161
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
162 <!-- further analysis options -->
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
163 <conditional name="viz">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
164 <param name="microreact" type="boolean" checked="true" label="Make visualisations for microreact (recommended)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
165 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
166 <when value="true">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
167 <param name="perplexity" type="float" value="20.0" min="5.0" max="100.0" label="Perplexity parameter for accessory plot t-SNE">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
168 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
169 </when>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
170 <when value='false' />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
171 </conditional>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
172 <param name="cytoscape" type="boolean" checked="false" label="Make visualisations for cytoscape">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
173 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
174
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
175 <!-- output options -->
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
176 <param name="no_full_db" type="boolean" checked="true" label="Select representative references">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
177 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
178
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
179 <param name="external_clusters" type="data" format="csv" optional="true" label="External cluster labels to add (e.g. MLST/serotype)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
180 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
181 <param name="info_csv" type="data" format="csv" optional="true" label="Epidemiological information CSV formatted for microreact">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
182 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
183
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
184 <!-- kmer comparison options -->
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
185 <param name="min_k" type="integer" value="13" min="7" max="31" label="Minimum k-mer length">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
186 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
187 <param name="max_k" type="integer" value="29" min="7" max="31" label="Maximum k-mer length">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
188 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
189 <param name="k_step" type="integer" value="3" min="2" max="5" label="Step size between k-mer lengths">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
190 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
191 <param name="sketch_size" type="integer" value="10000" min="1000" max="2000000" label="Sketch size">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
192 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
193
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
194 <!-- quality control options -->
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
195 <param name="max_a_dist" type="float" value="0.5" min="0.0" max="1.0" label="Maximum accessory distance to permit">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
196 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
197 <param name="ignore_length" type="boolean" checked="false" label="Ignore assembly length outliers">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
198 </param>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
199
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
200 </inputs>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
201
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
202 <outputs>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
203 <data name="clusters" format="csv" from_work_dir="poppunk_db/poppunk_db_clusters.csv" label="${tool.name} on ${on_string} (cluster assignment)" />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
204 <data name="distances" format="png" from_work_dir="poppunk_db/poppunk_db_distanceDistribution.png" label="${tool.name} on ${on_string} (distance plot)" />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
205 <data name="references" format="txt" from_work_dir="poppunk_db/poppunk_db.refs" label="${tool.name} on ${on_string} (selected references)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
206 <filter>no_full_db == True</filter>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
207 </data>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
208 <data name="gmm_plot" format="png" from_work_dir="poppunk_db/poppunk_db_DPGMM_fit.png" label="${tool.name} on ${on_string} (GMM cluster plot)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
209 <filter>model['model_mode'] == 'gmm'</filter>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
210 </data>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
211 <data name="gmm_contours" format="pdf" from_work_dir="poppunk_db/poppunk_db_DPGMM_fit_contours.pdf" label="${tool.name} on ${on_string} (GMM contour plot)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
212 <filter>model['model_mode'] == 'gmm'</filter>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
213 </data>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
214 <data name="dbscan_plot" format="png" from_work_dir="poppunk_db/poppunk_db_dbscan.png" label="${tool.name} on ${on_string} (DBSCAN cluster plot)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
215 <filter>model['model_mode'] == 'dbscan'</filter>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
216 </data>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
217 <data name="refine_plot" format="png" from_work_dir="poppunk_db/poppunk_db_refined_fit.png" label="${tool.name} on ${on_string} (refine model plot)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
218 <filter>refine['refine_model'] == True</filter>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
219 </data>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
220 <data name="cytoscape_network" format="data" from_work_dir="poppunk_db/poppunk_db_cytoscape.graphml" label="${tool.name} on ${on_string} (cytoscape network)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
221 <filter>cytoscape == True</filter>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
222 </data>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
223 <data name="cytoscape_clusters" format="csv" from_work_dir="poppunk_db/poppunk_db_cytoscape.csv" label="${tool.name} on ${on_string} (cytoscape csv)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
224 <filter>cytoscape == True</filter>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
225 </data>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
226 <data name="microreact_clusters" format="csv" from_work_dir="poppunk_db/poppunk_db_microreact_clusters.csv" label="${tool.name} on ${on_string} (microreact csv)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
227 <filter>viz['microreact'] == True</filter>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
228 </data>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
229 <data name="microreact_tree" format="newick" from_work_dir="poppunk_db/poppunk_db_core_NJ.nwk" label="${tool.name} on ${on_string} (microreact tree)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
230 <filter>viz['microreact'] == True</filter>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
231 </data>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
232 <data name="microreact_dot" format="graph_dot" from_work_dir="poppunk_db/poppunk_db_perplexity${perplexity}_accessory_tsne.dot" label="${tool.name} on ${on_string} (microreact dot)">
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
233 <filter>viz['microreact'] == True</filter>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
234 </data>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
235 </outputs>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
236
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
237 <tests>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
238 <test>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
239 <param name='input_assemblies' value='12673_8_24.contigs_velvet.fa,12673_8_34.contigs_velvet.fa,12673_8_43.contigs_velvet.fa,12754_4_71.contigs_velvet.fa,12754_4_77.contigs_velvet.fa' />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
240 <param name='model_mode' value='gmm' />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
241 <param name='K' value='4' />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
242 <param name='microreact' value='false' />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
243 <param name='no_full_db' value='false' />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
244 <output name="clusters" ftype='csv' file="clusters.csv" />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
245 <output name="distances" ftype='png' file="distances.png" />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
246 <output name="references" file="refs.txt" />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
247 <output name="gmm_plot" ftype='png' file="gmm_fit.png" />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
248 <output name="gmm_contours" ftype='pdf' file="gmm_contours.pdf" />
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
249 </test>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
250 </tests>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
251
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
252 <help><![CDATA[
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
253 **What it does**
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
254
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
255 PopPUNK will calculate core and accessory distance between input assemblies using variable length k-mers. A model will be fitted to all of these distances to determine genetic clusters for all inpits.
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
256
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
257 ------
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
258
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
259 **Description**
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
260
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
261 The most important thing to check is that in the output plot the component (blob) closest to the origin has been correctly identified - this should be checked in the cluster/model plot output. If it has not, you may wish to try another model. Some broad advice:
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
262
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
263 * DBSCAN is a good default, but may lead to unclassified points (black). If there are a large number of these consider another model.
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
264 * GMM will work well with well-separated components and an appropriate choice of K (consider increasing it based on the number of components that can be seen).
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
265 * The refine mode should be added in recombining species, which can be seen from the output plots if the coloured components are overlapping, or if there is a blur of points rather than discrete blobs of points..
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
266 ]]></help>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
267
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
268 <citations>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
269 <citation type='bibtex'>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
270 @article{Lees01022019,
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
271 author = {Lees, John A. and Harris, Simon R. and Tonkin-Hill, Gerry and Gladstone, Rebecca A. and Lo, Stephanie W. and Weiser, Jeffrey N. and Corander, Jukka and Bentley, Stephen D. and Croucher, Nicholas J.},
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
272 title = {Fast and flexible bacterial genomic epidemiology with PopPUNK},
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
273 volume = {29},
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
274 number = {2},
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
275 pages = {304-316},
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
276 year = {2019},
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
277 doi = {10.1101/gr.241455.118},
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
278 abstract ={The routine use of genomics for disease surveillance provides the opportunity for high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully exploit core and accessory genomic variation, and they cannot both automatically identify, and subsequently expand, clusters of significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK (Population Partitioning Using Nucleotide K-mers), a software implementing scalable and expandable annotation- and alignment-free methods for population analysis and clustering. Variable-length k-mer comparisons are used to distinguish isolates’ divergence in shared sequence and gene content, which we demonstrate to be accurate over multiple orders of magnitude using data from both simulations and genomic collections representing 10 taxonomically widespread species. Connections between closely related isolates of the same strain are robustly identified, despite interspecies variation in the pairwise distance distributions that reflects species’ diverse evolutionary patterns. PopPUNK can process 103–104 genomes in a single batch, with minimal memory use and runtimes up to 200-fold faster than existing model-based methods. Clusters of strains remain consistent as new batches of genomes are added, which is achieved without needing to reanalyze all genomes de novo. This facilitates real-time surveillance with consistent cluster naming between studies and allows for outbreak detection using hundreds of genomes in minutes. Interactive visualization and online publication is streamlined through the automatic output of results to multiple platforms. PopPUNK has been designed as a flexible platform that addresses important issues with currently used whole-genome clustering and typing methods, and has potential uses across bacterial genetics and public health research.},
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
279 URL = {http://genome.cshlp.org/content/29/2/304.abstract},
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
280 eprint = {http://genome.cshlp.org/content/29/2/304.full.pdf+html},
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
281 journal = {Genome Research}
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
282 }</citation>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
283 </citations>
d3c2de4d003a Uploaded
johnlees
parents:
diff changeset
284 </tool>