comparison abims_hclustering.xml @ 0:2f7381ee5235 draft

Uploaded
author lecorguille
date Tue, 30 Jun 2015 06:36:09 -0400
parents
children 36fc0a87d7fb
comparison
equal deleted inserted replaced
-1:000000000000 0:2f7381ee5235
1 <tool id="abims_hclustering" name="Hierarchical Clustering" version="1.1">
2
3 <description>using ctc R package for java-treeview</description>
4
5 <command interpreter="Rscript">
6 abims_hclustering.r file "$input" method $method link $link keep.hclust FALSE normalization $normalization sep "$sep" dec "$dec"
7 </command>
8
9 <inputs>
10 <param name="input" type="data" label="Data Matrix file" format="tabular" help="Matrix of numeric data with headers." />
11 <param name="method" type="select" label="Distance measure method" help="the distance measure to be used">
12 <option value="pearson" selected="true">pearson</option>
13 <option value="euclidean" >euclidean</option>
14 <option value="maximum" >maximum</option>
15 <option value="manhattan" >manhattan</option>
16 <option value="canberra" >canberra</option>
17 <option value="binary" >binary</option>
18 <option value="correlation" >correlation</option>
19 <option value="spearman" >spearman</option>
20 </param>
21 <param name="link" type="select" label="Agglomeration/Link method" help="the agglomeration method to be used">
22 <option value="ward" selected="true">ward</option>
23 <option value="single" >single</option>
24 <option value="complete" >complete</option>
25 <option value="average" >average</option>
26 <option value="mcquitty" >mcquitty</option>
27 <option value="median" >median</option>
28 <option value="centroid" >centroid</option>
29 </param>
30 <param name="normalization" type="select" label="Normalization by center and scale" help="Centering is done by subtracting the column means and scaling is done by dividing the (centered) columns of by their standard deviations">
31 <option value="T" selected="true">TRUE</option>
32 <option value="F" >FALSE</option>
33 </param>
34
35 <param name="sep" type="select" format="text" optional="true">
36 <label>Separator of columns</label>
37 <option value="tabulation">tabulation</option>
38 <option value="semicolon">;</option>
39 <option value="comma">,</option>
40 </param>
41 <param name="dec" type="text" label="Decimal separator" value="." help="" />
42
43 <!--<param name="nr_col_names" type="integer" label="names" value="2" help="number of the column with names of metabolits" />
44 <param name="from" type="integer" label="from" value="15" help="number of the column starting peak values data (to exlude all metadata)" />
45 <param name="to" type="integer" label="to" value="30" help="number of the column finishing peak values data (to exlude all metadata)" />
46 <param name="gr_number" type="integer" label="gr_number" value="2" help="number of groups (conditions)" />
47 <param name="nb_col_gr" type="text" label="nb_col_gr" value="8,8" help="number of column of each group; separate with coma as indicated; first position coresponding to the first group etc." />
48 <param name="threshold" type="float" label="threshold" value="0.01" help="max adjusted p.value accepted" />-->
49
50 </inputs>
51
52 <outputs>
53 <data name="hclust_zip" format="zip" from_work_dir="hclust.zip" label="${input.name[:-4]}.hclust.zip" />
54 </outputs>
55
56 <stdio>
57 <exit_code range="1:" level="fatal" />
58 </stdio>
59
60 <help>
61
62
63
64 .. class:: infomark
65
66 **Authors** Gildas Le Corguille ABiMS - UPMC/CNRS - Station Biologique de Roscoff - gildas.lecorguille|at|sb-roscoff.fr
67
68 ---------------------------------------------------
69
70 =======================
71 Hierarchical Clustering
72 =======================
73
74 -----------
75 Description
76 -----------
77
78 This function compute hierachical clustering with function
79 hcluster and export cluster to Java TreeView files format: jtreeview.sourceforge.net.
80
81 This function performs a **hierarchical cluster analysis** using a set
82 of dissimilarities for the n objects being clustered. Initially,
83 each object is assigned to its own cluster and then the algorithm
84 proceeds iteratively, at each stage joining the two most similar
85 clusters, continuing until there is just a single cluster. At
86 each stage distances between clusters are recomputed by the
87 Lance-Williams dissimilarity update formula according to the
88 particular clustering method being used.
89
90 A number of different **clustering methods** are provided. **Ward's**
91 minimum variance method aims at finding compact, spherical
92 clusters. The **complete linkage** method finds similar clusters.
93 The **single linkage** method (which is closely related to the
94 minimal spanning tree) adopts a ‘friends of friends’ clustering
95 strategy. The other methods can be regarded as aiming for
96 clusters with characteristics somewhere between the single and
97 complete link methods. Note however, that methods **median** and
98 **centroid** are not leading to a monotone distance measure,
99 or equivalently the resulting dendrograms can have so called
100 inversions (which are hard to interpret).
101
102
103
104
105 -----------
106 Input files
107 -----------
108
109 +---------------------------+------------+
110 | Parameter : num + label | Format |
111 +===========================+============+
112 | 1 : Data Matrix file | Tabular |
113 +---------------------------+------------+
114
115
116 ----------
117 Parameters
118 ----------
119
120
121 **Agglomeration or Link method:*
122
123 A number of different clustering methods are provided. Ward's minimum variance method aims at finding compact, spherical clusters.
124 The complete linkage method finds similar clusters. The single linkage method (which is closely related to the minimal spanning tree) adopts a ‘friends of friends’ clustering strategy.
125 The other methods can be regarded as aiming for clusters with characteristics somewhere between the single and complete link methods.
126 Note however, that methods median and centroid are not leading to a monotone distance measure, or equivalently the resulting dendrograms can have so called inversions (which are hard to interpret).
127
128
129 ------------
130 Output files
131 ------------
132
133 ***.tab.hclust.zip**
134
135 | A zip file containing three files (hclust.atr, hclust.cdt and hclust.gtr) that are Treeview format. If you want to have more informations or download Treeview, you can visit the webiste:
136 | http://jtreeview.sourceforge.net
137
138
139
140 ------
141
142 .. class:: infomark
143
144 You can continue your analysis using Treeview (outside of Galaxy) with the three files (atr,cdt and gtr) within the **xset.tab.hclust.zip** output.
145
146
147
148
149 ---------------------------------------------------
150
151 ---------------
152 Working example
153 ---------------
154
155
156 Input files
157 -----------
158
159 **>A part of an example of Data Matrix file input**
160
161
162 +--------+------------------+----------------+
163 | Name | Bur-eH_FSP_102 | Bur-eH_FSP_22 |
164 +========+==================+================+
165 |M202T601| 91206595.7559783 |106808979.08546 |
166 +--------+------------------+----------------+
167 |M234T851| 27249137.275504 |28824971.3177926|
168 +--------+------------------+----------------+
169
170
171 Parameters
172 ----------
173
174 | Distance measure method -> **pearson**
175 | Agglomeration/Link method -> **ward**
176 | Normalization by center and scale -> **TRUE**
177 | Separator of columns -> **tabulation**
178 | Decimal separator: -> **.**
179
180
181
182 Output files
183 ------------
184
185 **Example of an dendrogram/heatmap generated by the Treeview tool**:
186
187 .. image:: hclust.png
188
189
190 </help>
191
192 </tool>