annotate abims_hclustering.xml @ 0:2f7381ee5235 draft

Uploaded
author lecorguille
date Tue, 30 Jun 2015 06:36:09 -0400
parents
children 36fc0a87d7fb
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
1 <tool id="abims_hclustering" name="Hierarchical Clustering" version="1.1">
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
2
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
3 <description>using ctc R package for java-treeview</description>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
4
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
5 <command interpreter="Rscript">
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
6 abims_hclustering.r file "$input" method $method link $link keep.hclust FALSE normalization $normalization sep "$sep" dec "$dec"
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
7 </command>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
8
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
9 <inputs>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
10 <param name="input" type="data" label="Data Matrix file" format="tabular" help="Matrix of numeric data with headers." />
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
11 <param name="method" type="select" label="Distance measure method" help="the distance measure to be used">
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
12 <option value="pearson" selected="true">pearson</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
13 <option value="euclidean" >euclidean</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
14 <option value="maximum" >maximum</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
15 <option value="manhattan" >manhattan</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
16 <option value="canberra" >canberra</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
17 <option value="binary" >binary</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
18 <option value="correlation" >correlation</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
19 <option value="spearman" >spearman</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
20 </param>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
21 <param name="link" type="select" label="Agglomeration/Link method" help="the agglomeration method to be used">
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
22 <option value="ward" selected="true">ward</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
23 <option value="single" >single</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
24 <option value="complete" >complete</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
25 <option value="average" >average</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
26 <option value="mcquitty" >mcquitty</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
27 <option value="median" >median</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
28 <option value="centroid" >centroid</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
29 </param>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
30 <param name="normalization" type="select" label="Normalization by center and scale" help="Centering is done by subtracting the column means and scaling is done by dividing the (centered) columns of by their standard deviations">
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
31 <option value="T" selected="true">TRUE</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
32 <option value="F" >FALSE</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
33 </param>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
34
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
35 <param name="sep" type="select" format="text" optional="true">
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
36 <label>Separator of columns</label>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
37 <option value="tabulation">tabulation</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
38 <option value="semicolon">;</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
39 <option value="comma">,</option>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
40 </param>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
41 <param name="dec" type="text" label="Decimal separator" value="." help="" />
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
42
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
43 <!--<param name="nr_col_names" type="integer" label="names" value="2" help="number of the column with names of metabolits" />
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
44 <param name="from" type="integer" label="from" value="15" help="number of the column starting peak values data (to exlude all metadata)" />
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
45 <param name="to" type="integer" label="to" value="30" help="number of the column finishing peak values data (to exlude all metadata)" />
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
46 <param name="gr_number" type="integer" label="gr_number" value="2" help="number of groups (conditions)" />
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
47 <param name="nb_col_gr" type="text" label="nb_col_gr" value="8,8" help="number of column of each group; separate with coma as indicated; first position coresponding to the first group etc." />
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
48 <param name="threshold" type="float" label="threshold" value="0.01" help="max adjusted p.value accepted" />-->
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
49
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
50 </inputs>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
51
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
52 <outputs>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
53 <data name="hclust_zip" format="zip" from_work_dir="hclust.zip" label="${input.name[:-4]}.hclust.zip" />
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
54 </outputs>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
55
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
56 <stdio>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
57 <exit_code range="1:" level="fatal" />
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
58 </stdio>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
59
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
60 <help>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
61
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
62
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
63
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
64 .. class:: infomark
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
65
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
66 **Authors** Gildas Le Corguille ABiMS - UPMC/CNRS - Station Biologique de Roscoff - gildas.lecorguille|at|sb-roscoff.fr
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
67
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
68 ---------------------------------------------------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
69
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
70 =======================
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
71 Hierarchical Clustering
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
72 =======================
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
73
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
74 -----------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
75 Description
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
76 -----------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
77
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
78 This function compute hierachical clustering with function
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
79 hcluster and export cluster to Java TreeView files format: jtreeview.sourceforge.net.
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
80
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
81 This function performs a **hierarchical cluster analysis** using a set
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
82 of dissimilarities for the n objects being clustered. Initially,
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
83 each object is assigned to its own cluster and then the algorithm
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
84 proceeds iteratively, at each stage joining the two most similar
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
85 clusters, continuing until there is just a single cluster. At
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
86 each stage distances between clusters are recomputed by the
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
87 Lance-Williams dissimilarity update formula according to the
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
88 particular clustering method being used.
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
89
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
90 A number of different **clustering methods** are provided. **Ward's**
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
91 minimum variance method aims at finding compact, spherical
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
92 clusters. The **complete linkage** method finds similar clusters.
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
93 The **single linkage** method (which is closely related to the
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
94 minimal spanning tree) adopts a ‘friends of friends’ clustering
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
95 strategy. The other methods can be regarded as aiming for
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
96 clusters with characteristics somewhere between the single and
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
97 complete link methods. Note however, that methods **median** and
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
98 **centroid** are not leading to a monotone distance measure,
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
99 or equivalently the resulting dendrograms can have so called
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
100 inversions (which are hard to interpret).
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
101
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
102
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
103
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
104
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
105 -----------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
106 Input files
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
107 -----------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
108
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
109 +---------------------------+------------+
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
110 | Parameter : num + label | Format |
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
111 +===========================+============+
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
112 | 1 : Data Matrix file | Tabular |
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
113 +---------------------------+------------+
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
114
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
115
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
116 ----------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
117 Parameters
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
118 ----------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
119
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
120
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
121 **Agglomeration or Link method:*
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
122
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
123 A number of different clustering methods are provided. Ward's minimum variance method aims at finding compact, spherical clusters.
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
124 The complete linkage method finds similar clusters. The single linkage method (which is closely related to the minimal spanning tree) adopts a ‘friends of friends’ clustering strategy.
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
125 The other methods can be regarded as aiming for clusters with characteristics somewhere between the single and complete link methods.
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
126 Note however, that methods median and centroid are not leading to a monotone distance measure, or equivalently the resulting dendrograms can have so called inversions (which are hard to interpret).
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
127
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
128
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
129 ------------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
130 Output files
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
131 ------------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
132
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
133 ***.tab.hclust.zip**
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
134
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
135 | A zip file containing three files (hclust.atr, hclust.cdt and hclust.gtr) that are Treeview format. If you want to have more informations or download Treeview, you can visit the webiste:
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
136 | http://jtreeview.sourceforge.net
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
137
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
138
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
139
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
140 ------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
141
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
142 .. class:: infomark
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
143
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
144 You can continue your analysis using Treeview (outside of Galaxy) with the three files (atr,cdt and gtr) within the **xset.tab.hclust.zip** output.
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
145
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
146
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
147
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
148
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
149 ---------------------------------------------------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
150
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
151 ---------------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
152 Working example
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
153 ---------------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
154
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
155
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
156 Input files
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
157 -----------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
158
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
159 **>A part of an example of Data Matrix file input**
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
160
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
161
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
162 +--------+------------------+----------------+
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
163 | Name | Bur-eH_FSP_102 | Bur-eH_FSP_22 |
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
164 +========+==================+================+
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
165 |M202T601| 91206595.7559783 |106808979.08546 |
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
166 +--------+------------------+----------------+
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
167 |M234T851| 27249137.275504 |28824971.3177926|
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
168 +--------+------------------+----------------+
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
169
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
170
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
171 Parameters
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
172 ----------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
173
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
174 | Distance measure method -> **pearson**
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
175 | Agglomeration/Link method -> **ward**
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
176 | Normalization by center and scale -> **TRUE**
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
177 | Separator of columns -> **tabulation**
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
178 | Decimal separator: -> **.**
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
179
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
180
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
181
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
182 Output files
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
183 ------------
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
184
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
185 **Example of an dendrogram/heatmap generated by the Treeview tool**:
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
186
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
187 .. image:: hclust.png
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
188
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
189
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
190 </help>
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
191
2f7381ee5235 Uploaded
lecorguille
parents:
diff changeset
192 </tool>