comparison clustering.xml @ 0:76c750c5f0d1 draft default tip

planemo upload for repository https://github.com/oinizan/FROGS-wrappers commit 0b900a51e220ce6f17c1e76292c06a5f4d934055-dirty
author frogs
date Thu, 25 Oct 2018 05:01:13 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:76c750c5f0d1
1 <?xml version="1.0"?>
2 <!--
3 # Copyright (C) 2015 INRA
4 #
5 # This program is free software: you can redistribute it and/or modify
6 # it under the terms of the GNU General Public License as published by
7 # the Free Software Foundation, either version 3 of the License, or
8 # (at your option) any later version.
9 #
10 # This program is distributed in the hope that it will be useful,
11 # but WITHOUT ANY WARRANTY; without even the implied warranty of
12 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13 # GNU General Public License for more details.
14 #
15 # You should have received a copy of the GNU General Public License
16 # along with this program. If not, see <http://www.gnu.org/licenses/>.
17 -->
18 <tool id="FROGS_clustering" name="FROGS Clustering swarm" version="2.3.0">
19 <description>Step 2 in metagenomics analysis : clustering.</description>
20 <requirements>
21 <requirement type="package" version="2.0.1">frogs</requirement>
22 </requirements>
23 <stdio>
24 <exit_code range="1:" />
25 <exit_code range=":-1" />
26 </stdio>
27 <command>
28 clustering.py
29 --nb-cpus $nb_cpus
30 --distance $maximal_distance
31 --input-fasta $sequence_file
32 --input-count $count_file
33 --output-biom $abundance_biom
34 --output-fasta $seed_file
35 --output-compo $swarms_composition
36 $denoising
37 </command>
38 <inputs>
39 <!-- Files -->
40 <param format="fasta" name="sequence_file" type="data" label="Sequences file" help="The sequences file (format: fasta)." optional="false" />
41 <param format="tabular" name="count_file" type="data" label="Count file" help="It contains the count by sample for each sequence (format: TSV)." optional="false" />
42 <!-- Parameters -->
43 <param name="nb_cpus" type="hidden" label="CPU number" help="The maximum number of CPUs used." value="1" />
44 <param name="maximal_distance" type="integer" label="Aggregation distance" help="Maximum number of differences between sequences in each aggregation step." value="3" min="1" max="15" optional="false" />
45 <param name="denoising" type="boolean" checked="true" truevalue="--denoising" falsevalue="" label="Performe denoising clustering step?" help="If checked, clustering will be perform in two steps, first with distance = 1 and then with your input distance"/>
46 </inputs>
47 <outputs>
48 <data format="fasta" name="seed_file" label="${tool.name}: seed_sequences.fasta" from_work_dir="seeds.fasta"/>
49 <data format="biom1" name="abundance_biom" label="${tool.name}: abundance.biom" from_work_dir="abundance.biom" />
50 <data format="tabular" name="swarms_composition" label="${tool.name}: swarms_composition.tsv" from_work_dir="swarms.tsv"/>
51 </outputs>
52 <tests>
53 <test>
54 <param name="sequence_file" value="references/01-prepro.fasta"/>
55 <param name="count_file" value="references/01-prepro.tsv"/>
56 <param name="maximal_distance" value="3"/>
57 <param name="denoising" value="true"/>
58 <output name="seed_file" file="references/02-clustering.fasta" compare="sim_size" delta="0" />
59 <!-- output name="abundance_biom" file="references/02-clustering.biom" -->
60 <!-- Voila deux autres facon de comparer: (Mais ca ne marche pas dans ce cas car les fichiers générés par planemo sont vraiments différents comparé à nos test manuels (cf. /tmp/frogs-test-result-olivier /tmp/frogs-test-result-olivier2) -->
61 <!-- cf. https://docs.galaxyproject.org/en/latest/dev/schema.html#tool-tests-test-output -->
62 <!--output name="abundance_biom" file="references/02-clustering.biom" compare="sim_size" delta="10"/-->
63 <!--output name="abundance_biom" file="references/02-clustering.biom" compare="diff" lines_diff="1"/-->
64 <output name="swarms_composition" file="references/02-clustering_compo.tsv" compare="sim_size" delta="0" />
65 </test>
66 </tests>
67 <help>
68
69 .. image:: static/images/FROGS_logo.png
70 :height: 144
71 :width: 110
72
73
74 .. class:: infomark page-header h2
75
76 What it does
77
78 Single-linkage clustering on sequences.
79
80
81 .. class:: infomark page-header h2
82
83 Inputs/Outputs
84
85 .. class:: h3
86
87 Inputs
88
89 **Sequences file**:
90
91 The sequence file with all samples sequences (format `FASTA &lt;https://en.wikipedia.org/wiki/FASTA_format&gt;`_). These sequences are dereplicated: strictly identical sequence are represented only one and the initial count is kept in count file.
92
93 The sequence ID must be "sequenceID;size=X" with X equal to the total abundance among all samples.
94
95 *It corresponds to one output of FROGS Pre-process tools.*
96
97 **Count file**:
98
99 This file contains the count of all uniq sequences in each sample (format `TSV &lt;https://en.wikipedia.org/wiki/Tab-separated_values&gt;`_).
100
101 Example::
102
103 #id splA splB
104 seq1 1289 2901
105 seq2 3415 0
106
107
108 .. class:: h3
109
110 Outputs
111
112 **Abundance file** (abundance.biom):
113
114 The abundance of each cluster in each sample (format `BIOM &lt;http://biom-format.org/&gt;`_). This format is widely used in metagenomic softwares.
115
116
117 **Clusters seeds** (seed_sequences.fasta):
118
119 The clusters representative sequences (format `FASTA &lt;https://en.wikipedia.org/wiki/FASTA_format&gt;`_).
120
121
122 **Clusters composition** (swarms_composition.tsv):
123
124 A text file representing the read composition of each cluster (format txt). Each line represents one cluster and is composed of read identifier separated by space.
125
126
127 .. class:: infomark page-header h2
128
129 How it works
130
131 .. csv-table::
132 :header: "Steps", "With denoising", "Without denoising"
133 :widths: 5, 150, 150
134 :class: table table-striped
135
136 "1", "Sorting the reads by their abundance", "Sorting the reads by their abundance"
137 "2", "Clusters the reads (`Swarm &lt;https://github.com/torognes/swarm&gt;`_). The distance parameter is 1", "/"
138 "3", "Sorting the pre-clusters by their abundance", "/"
139 "4", "Clusters the pre-clusters (`Swarm &lt;https://github.com/torognes/swarm&gt;`_) with the distance you specify", "Clusters the reads (`Swarm &lt;https://github.com/torognes/swarm&gt;`_) with the distance you specify"
140
141 **Swarm focus**
142
143 Swarm use an iterative growth process and the use of sequence abundance values to delineate OTUs.
144
145 .. image:: static/images/FROGS_cluster_swarm.png
146 :height: 223
147 :width: 666
148
149 In each groth step the sequence of the previous step is used to find the others sequences with a number of differences inferior or equal to the "Aggregation distance".
150
151 After agregation Swarm refines the clusters by looking at the abundancies along the connections. Theoritically the abundances must decrease when you are going away from the seed (which is often the most abundant sequence). If this abundance raises again it means that two different clusters are connected by some poorly abundant sequences, so swarm cut the connection.
152
153
154 .. class:: infomark page-header h2
155
156 Advices
157
158 The denoising step allows to build very fine clusters with minimal differences. In this case, the number of differences is equal at 1 between sequences of each crowns. This first clustering is extremly quick. After the denoising, a second swarm is run with an aggregation distance >1 as you have configured, between seeds from this first clustering.
159
160 To have some metrics on your clusters, you can use the tool **FROGS Clusters Stat**.
161
162
163 ----
164
165 **Contact**
166
167 Contacts: frogs@inra.fr
168
169 Repository: https://github.com/geraldinepascal/FROGS
170
171 Please cite the FROGS Publication: *Escudie F., Auer L., Bernard M., Cauquil L., Vidal K., Maman S., Mariadassou M., Hernadez-Raquet G., Pascal G., 2015. FROGS: Find Rapidly OTU with Galaxy Solution. In: The environmental genomic Conference, Montpellier, France,* http://bioinfo.genotoul.fr/fileadmin/user_upload/FROGS_2015_GE_Montpellier_poster.pdf
172
173 Depending on the help provided you can cite us in acknowledgements, references or both.
174 </help>
175 <citations>
176 <citation type="doi">10.7287/peerj.preprints.386v1</citation>
177 </citations>
178 </tool>