comparison univariate_config.xml @ 0:ef64d3752050 draft

planemo upload for repository https://github.com/workflow4metabolomics/univariate.git commit ca0e312e1c986c45310f37effe031f60009fbcab
author ethevenot
date Wed, 27 Jul 2016 11:44:34 -0400
parents
children fdefbc780d2e
comparison
equal deleted inserted replaced
-1:000000000000 0:ef64d3752050
1 <tool id="Univariate" name="Univariate" version="2.1.1">
2 <description>Univariate statistics</description>
3
4 <requirements>
5 <requirement type="package" version="3.2.2">R</requirement>
6 <requirement type="package">r-batch</requirement>
7 <requirement type="package">r-pmcmr</requirement>
8 </requirements>
9
10 <command><![CDATA[
11 $__tool_directory__/univariate_wrapper.R
12 dataMatrix_in "$dataMatrix_in"
13 sampleMetadata_in "$sampleMetadata_in"
14 variableMetadata_in "$variableMetadata_in"
15
16 facC "$facC"
17 tesC "$tesC"
18 adjC "$adjC"
19 thrN "$thrN"
20
21 variableMetadata_out "$variableMetadata_out"
22 information "$information"
23 ]]></command>
24
25 <inputs>
26 <param name="dataMatrix_in" label="Data matrix file" type="data" format="tabular" help="variable x sample, decimal: '.', missing: NA, mode: numerical, sep: tabular" />
27 <param name="sampleMetadata_in" label="Sample metadata file" type="data" format="tabular" help="sample x metadata, decimal: '.', missing: NA, mode: character and numerical, sep: tabular" />
28 <param name="variableMetadata_in" label="Variable metadata file" type="data" format="tabular" help="variable x metadata, decimal: '.', missing: NA, mode: character and numerical, sep: tabular" />
29 <param name="facC" label="Factor of interest" type="text" help="Name of the column of the sample metadata table corresponding to the qualitative or quantitative variable"/>
30 <param name="tesC" label="Test" type="select" help="">
31 <option value="ttest">ttest (qualitative, 2 levels)</option>
32 <option value="wilcoxon">Wilcoxon test (qualitative, 2 levels)</option>
33 <option value="anova">Analysis of variance (qualitative, more than 2 levels)</option>
34 <option value="kruskal">Kruskal-Wallis rank test (qualitative, more than 2 levels)</option>
35 <option value="pearson">Pearson correlation test (quantitative)</option>
36 <option value="spearman">Spearman correlation rank test (quantitative)</option>
37 </param>
38 <param name="adjC" label="Method for multiple testing correction" type="select" help="">
39 <option value="fdr">fdr</option>
40 <option value="BH">BH</option>
41 <option value="bonferroni">bonferroni</option>
42 <option value="BY">BY</option>
43 <option value="hochberg">hochberg</option>
44 <option value="holm">holm</option>
45 <option value="hommel">hommel</option>
46 <option value="none">none</option>
47 </param>
48 <param name="thrN" type="float" value="0.05" label="(Corrected) p-value significance threshold" help="Must be between 0 and 1"/>
49 </inputs>
50
51 <outputs>
52 <data name="variableMetadata_out" label="${tool.name}_${variableMetadata_in.name}" format="tabular" ></data>
53 <data name="information" label="${tool.name}_information.txt" format="txt"/>
54 </outputs>
55
56 <tests>
57 <test>
58 <param name="dataMatrix_in" value="dataMatrix.tsv"/>
59 <param name="sampleMetadata_in" value="sampleMetadata.tsv"/>
60 <param name="variableMetadata_in" value="variableMetadata.tsv"/>
61 <param name="facC" value="ageGroup"/>
62 <param name="tesC" value="kruskal"/>
63 <param name="adjC" value="fdr"/>
64 <param name="thrN" value="0.05"/>
65 <output name="variableMetadata_out" file="variableMetadata-output.tsv"/>
66 </test>
67 </tests>
68
69 <help>
70
71 .. class:: infomark
72
73 | **Tool update: See the 'NEWS' section at the bottom of the page**
74
75 ---------------------------------------------------
76
77 .. class:: infomark
78
79 **Authors**
80
81 | **Marie Tremblay-Franco (marie.tremblay-franco@toulouse.inra.fr)** and **Etienne Thevenot (etienne.thevenot@cea.fr)** wrote this wrapper of R univariate statistical tests.
82 | MetaboHUB: The French National Infrastructure for Metabolomics and Fluxomics (http://www.metabohub.fr/en)
83
84 ---------------------------------------------------
85
86 .. class:: infomark
87
88 **Please cite**
89
90 R Core Team (2013). R: A language and Environment for Statistical Computing. http://www.r-project.org
91
92 ---------------------------------------------------
93
94 .. class:: infomark
95
96 **References**
97
98 | Benjamini Y. and Hochberg Y. (1995). Controlling the false discovery rate: a practical and powerful approach for multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57:289-300.
99 | Dalgaard P. (2002). Introductory statistics with R. Springer.
100 | Kvam P. and Vidakovic B. (2007). Nonparametric statistics with applications to science and engineering. Wiley.
101 | Van Belle G., Fisher L., Heagerty P. and Lumley T. (2004). Biostatistics - a methodology for the health sciences. Wiley.
102 | Pohlert T. (2015). PMCMR: Calculate pairwise multiple comparisons of mean rank sums. R package on CRAN.
103
104 ---------------------------------------------------
105
106 =====================
107 Univariate statistics
108 =====================
109
110 -----------
111 Description
112 -----------
113
114 | The module performs two sample tests (t-test and Wilcoxon rank test), analysis of variance and Kruskal-Wallis rank test, and correlation tests (by using either the pearson or the spearman correlation)
115
116 -----------------
117 Workflow position
118 -----------------
119
120 .. image:: univariate_workflowPositionImage.png
121 :width: 584
122
123 -----------
124 Input files
125 -----------
126
127 +------------------------------+------------+
128 | File | Format |
129 +==============================+============+
130 | 1) Data matrix | tabular |
131 +------------------------------+------------+
132 | 2) Sample metadata | tabular |
133 +------------------------------+------------+
134 | 3) Variable metadata | tabular |
135 +------------------------------+------------+
136
137 ----------
138 Parameters
139 ----------
140
141 Data matrix file
142 | variable x sample **dataMatrix** tabular separated file of the numeric data matrix, with . as decimal, and NA for missing values; the table must not contain metadata apart from row and column names; the row and column names must be identical to the rownames of the sample and variable metadata, respectively (see below)
143 |
144
145 Sample metadata file
146 | sample x metadata **sampleMetadata** tabular separated file of the numeric and/or character sample metadata, with . as decimal and NA for missing values
147 |
148
149 Variable metadata file
150 | variable x metadata **variableMetadata** tabular separated file of the numeric and/or character variable metadata, with . as decimal and NA for missing values
151 |
152
153 Factor
154 | Column of the sample metadata table to be used as qualitative factor (t-test, Wilcoxon test, Analysis of variance, Kruskal-Wallis test) or quantitative variable (correlation)
155 |
156
157 Test
158 | Depending on the factor of interest (qualitative with 2 or more levels, or quantitative), and on the normality of the sample values (determining whether a parametric or nonparametric test is required), you can choose one of the 6 tests available:
159
160
161 +---------------------------+------------------+----------------------+----------------------+
162 | Factor to be tested | Number of levels | Parametric test | Nonparametric test |
163 +===========================+==================+======================+======================+
164 | Qualitative | 2 | t-test | Wilcoxon test |
165 + +------------------+----------------------+----------------------+
166 | | > 2 | Analysis of variance | Kruskal-Wallis |
167 +---------------------------+------------------+----------------------+----------------------+
168 | Quantitative | | Pearson correlation | Spearman correlation |
169 +---------------------------+------------------+----------------------+----------------------+
170
171 Method for multiple testing correction
172 | The 7 methods implemented in the 'p.adjust' R function are available and documented as follows:
173 | "The adjustment methods include the Bonferroni correction ("bonferroni") in which the p-values are multiplied by the number of comparisons. Less conservative corrections are also included by Holm (1979) ("holm"), Hochberg (1988) ("hochberg"), Hommel (1988) ("hommel"), Benjamini and Hochberg (1995) ("BH" or its alias "fdr"), and Benjamini and Yekutieli (2001) ("BY"), respectively. A pass-through option ("none") is also included. The set of methods are contained in the p.adjust.methods vector for the benefit of methods that need to have the method as an option and pass it on to p.adjust. The first four methods are designed to give strong control of the family-wise error rate. There seems no reason to use the unmodified Bonferroni correction because it is dominated by Holm's method, which is also valid under arbitrary assumptions. Hochberg's and Hommel's methods are valid when the hypothesis tests are independent or when they are non-negatively associated (Sarkar, 1998; Sarkar and Chang, 1997). Hommel's method is more powerful than Hochberg's, but the difference is usually small and the Hochberg p-values are faster to compute. The "BH" (aka "fdr") and "BY" method of Benjamini, Hochberg, and Yekutieli control the false discovery rate, the expected proportion of false discoveries amongst the rejected hypotheses. The false discovery rate is a less stringent condition than the family-wise error rate, so these methods are more powerful than the others."
174 |
175
176 (Corrected) p-value significance threshold
177 |
178 |
179
180
181 ------------
182 Output files
183 ------------
184
185 variableMetadata_out.tabular
186 | **variableMetadata** file identical to the file given as argument, except that (at least) three columns have been added:
187 | 1) [factor]_[test]_[class'a']-[class'b']_dif or [factor]_[test]_cor: difference of the means (ttest) or the medians (wilcoxon) between the two classes, or 'pearson' or 'spearman' correlations
188 | 2) [factor]_[test]_[class'a']-[class'b']_[method] or [factor]_[test]_[method]: adjusted p-values
189 | 3) [factor]_[test]_[class'a']-[class'b']_sig or [factor]_[test]_sig: significance (coded as '1' if below the threshold and '0' otherwise)
190 | In the case of 'anova' and 'kruskal', the columns 2) and 3) appear first to give the results from the ANOVA or Kruskal Wallis test, and, when these tests are significant, the results of the pairwise comparisons are reported in additional columns (otherwise NA in these columns): in the case of ANOVA, the Tukey HSD post-hoc analysis is used (for each comparison, the difference between means, p value, and significance are provided); in the case of Kruskal Wallis, the Nemenyi is performed (PMCMR package) (for each pairwise comparison, the difference between medians, p value and significance are provided)
191 |
192
193 information.txt
194 | File with all messages and warnings generated during the computation
195 |
196
197 ---------------------------------------------------
198
199 ---------------
200 Working example
201 ---------------
202
203 Input files
204 ===========
205
206 | **To generate the "dataMatrix", "sampleMetadata" and "variableMetadata" files:**
207 | **1) copy/paste the values below in three distinct .txt files**
208 | **2) use the "Get Data" / "Upload File" in the "Tools" (left) panel from the Galaxy / ABiMS page by choosing:**
209 | **a) File Format: 'tabular'**
210 | **b) Convert spaces to tabs: 'Yes'**
211 |
212
213 **dataMatrix file**::
214
215 dataMatrix HU_017 HU_021 HU_027 HU_032 HU_041 HU_048 HU_049 HU_050 HU_052 HU_059 HU_060 HU_066 HU_072 HU_077 HU_090 HU_109 HU_110 HU_125 HU_126 HU_131 HU_134 HU_149 HU_150 HU_173 HU_179 HU_180 HU_182 HU_202 HU_204 HU_209
216 HMDB01032 2569204.92420381 6222035.77434915 17070707.9912636 1258838.24348419 13039543.0754619 1909391.77026598 3495.09386434063 2293521.90928998 128503.275117713 81872.5276382213 8103557.56578035 149574887.036181 1544036.41049333 7103429.53933206 14138796.50382 4970265.57952158 263054.73056162 1671332.30008058 88433.1944958815 23602331.2894815 18648126.5206986 1554657.98756878 34152.3646391152 209372.71275317 33187733.370626 202438.591636003 13581070.0886437 354170.810678102 9120781.48986975 43419175.4051586
217 HMDB03072 3628416.30251025 65626.9834353751 112170.118946651 3261804.34422417 42228.2787747563 343254.201250707 1958217.69317664 11983270.0435677 5932111.41638028 5511385.83359531 9154521.47755199 2632133.21209418 9500411.14556502 6551644.51726592 7204319.80891836 1273412.04795188 3260583.81592376 8932005.5351622 8340827.52597275 9256460.69197759 11217839.169041 5919262.81433556 11790077.0657915 9567977.80797097 73717.5811684739 9991787.29074293 4208098.14739633 623970.649925847 10904221.2642849 2171793.93621067
218 HMDB00792 429568.609438384 3887629.50527037 1330692.11658995 1367446.73023821 844197.447472453 2948090.71886592 1614157.90566884 3740009.19379795 3292251.66531919 2310688.79492013 4404239.59008605 3043289.12780863 825736.467181043 2523241.91730649 6030501.02648005 474901.604069803 2885792.42617652 2955990.64049134 1917716.3427982 1767962.67737699 5926203.40397675 1639065.69474684 346810.763557826 1054776.22313737 2390258.27543894 1831346.37315857 1026696.36904362 7079792.50047866 4368341.01359769 3495986.87280275
219
220
221 **sampleMetadata file**::
222
223 sampleMetadata age ageGrp
224 HU_017 41 experienced
225 HU_021 34 junior
226 HU_027 37 experienced
227 HU_032 38 experienced
228 HU_041 28 junior
229 HU_048 39 experienced
230 HU_049 50 senior
231 HU_050 30 junior
232 HU_052 51 senior
233 HU_059 81 senior
234 HU_060 55 senior
235 HU_066 25 junior
236 HU_072 47 experienced
237 HU_077 27 junior
238 HU_090 46 experienced
239 HU_109 32 junior
240 HU_110 50 senior
241 HU_125 58 senior
242 HU_126 45 experienced
243 HU_131 42 experienced
244 HU_134 48 experienced
245 HU_149 35 experienced
246 HU_150 49 experienced
247 HU_173 55 senior
248 HU_179 33 junior
249 HU_180 53 senior
250 HU_182 43 experienced
251 HU_202 42 experienced
252 HU_204 31 junior
253 HU_209 17.5 junior
254
255
256 **variableMetadata file**::
257
258 variableMetadata name
259 HMDB01032 Dehydroepiandrosterone sulfate
260 HMDB03072 Quinic acid
261 HMDB00792 Sebacic acid
262
263
264 Parameters
265 ==========
266
267 **Factor of interest:** "ageGroup"
268
269 **Test:** "Kruskal-Wallis rank test (qualitative, > 2 levels)"
270
271 **Method for multiple testing correction:** "fdr"
272
273 **(Corrected) p-value significance threshold:** 0.05
274
275
276 Output files
277 ============
278
279 +------------------------------+------------+
280 | File | Format |
281 +==============================+============+
282 | 1) dataMatrix | tabular |
283 +------------------------------+------------+
284 | 2) sampleMetadata | tabular |
285 +------------------------------+------------+
286 | 3) variableMetadata | tabular |
287 +------------------------------+------------+
288 | 4) information | text |
289 +------------------------------+------------+
290
291
292 ---------------------------------------------------
293
294 ----
295 NEWS
296 ----
297
298 CHANGES IN VERSION 2.1.1
299 ========================
300
301 Internal handling of 'NA' p-values (e.g. when intensities are identical in all samples)
302
303 CHANGES IN VERSION 2.0.1
304 ========================
305
306 (corrected) p-value threshold can be set to any value between 0 and 1
307
308
309 </help>
310
311 <citations/>
312
313 </tool>