comparison normalization.xml @ 0:79f00bc83ecc draft default tip

planemo upload commit a2411926bebc2ca3bb31215899a9f18a67e59556
author vmarcon
date Thu, 18 Jan 2018 06:20:30 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:79f00bc83ecc
1 <!--# Copyright (C) 2017 INRA
2 # This program is free software: you can redistribute it and/or modify
3 # it under the terms of the GNU General Public License as published by
4 # the Free Software Foundation, either version 3 of the License, or
5 # (at your option) any later version.
6 #
7 # This program is distributed in the hope that it will be useful,
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
10 # GNU General Public License for more details.
11 #
12 # You should have received a copy of the GNU General Public License
13 # along with this program. If not, see http://www.gnu.org/licenses/.
14 #-->
15
16 <tool id="normalization" name="Normalization" version="1.0.0">
17 <description>Normalize your data with some well known methods</description>
18 <requirements>
19 <requirement type="package">R</requirement>
20 <requirement type="package">bioconductor-deseq2</requirement>
21 <requirement type="package">r-batch</requirement>
22 </requirements>
23 <stdio>
24 <!-- Anything other than zero is an error -->
25 <exit_code range="1:" level="fatal"/>
26 <exit_code range=":-1" level="fatal"/>
27 </stdio>
28 <command interpreter="Rscript"><![CDATA[
29 normalization_galaxy.R
30 input_file '${input_file}'
31 transformation_method '${transformation_method}'
32 na_encoding '${na_encoding}'
33 output_file '${output_file}'
34 log_file '${log_file}'
35 variable_in_line '${variable_in_line}'
36 ]]></command>
37 <inputs>
38 <param format="tabular,csv" name="input_file" type="data" label="Input file"/>
39 <param name="transformation_method" type="select" label="Data transformation method" help="See the complete help below for more details">
40 <option value="log">Log (binary logarithm)</option>
41 <option value="DESeq2">DESeq2 for NGS counts</option>
42 <option value="Rlog">RLog (as implemented in DESeq2)</option>
43 <option value="Standard_score">Standard score (mean=0;sd=1) </option>
44 <option value="Pareto">Pareto (mean=0;sd moderate)</option>
45 <option value="TSS">Total sum scaling (TSS)</option>
46 <option value="TSS_CLR">Total sum scaling + log ratio (TSS+CLR)</option>
47 <validator type="empty_field" message="Please choose, at least, one data transformation method." />
48 </param>
49 <param name="na_encoding" size="30" type="text" value="NA" label="Label used for Missing values"/>
50 <param name="variable_in_line" type="select" multiple="false" display="radio" label="Variable in line or column?">
51 <option value="1">Line</option>
52 <option value="0">Column</option>
53 </param>
54 </inputs>
55 <outputs>
56 <data name="log_file" format="html" label="Normalization_log"/>
57 <data name="output_file" format_source="input_file" label="Transfo-${transformation_method.value}_${input_file.name}"/>
58 </outputs>
59 <tests>
60 <test>
61 <param name="input_file" value="decathlon.tsv"/>
62 <param name="transformation_method" value="log"/>
63 <param name="na_encoding" value="NA"/>
64 <param name="variable_in_line" value="0"/>
65 <output name="log_file" file="log_file"/>
66 <output name="output_file" file="output_file"/>
67 </test>
68 </tests>
69 <help><![CDATA[
70
71 =========
72 Normalize
73 =========
74
75 -----------
76 Description
77 -----------
78
79 - This tool is part of a set of statistical tools made by members of the BIOS4BIOL group ("Normalization", "Summary statistics", "Hierarchical clustering" and "PCAFactoMineR").
80 - Please use this Normalization module before using other modules of the suite.
81
82 What it does:
83 - It normalize your data with some well known methods
84
85 ------
86
87 -----------
88 Input files
89 -----------
90
91 +---------------------------+------------+
92 | Parameter : num + label | Format |
93 +===========================+============+
94 | 1 : input file | tabular |
95 +---------------------------+------------+
96
97
98 ----------
99 Parameters
100 ----------
101
102 Data transformation method
103 | Possible values: "log", "DESeq2", "Rlog", "Standard_score", "TSS", "TSS_CLR"
104 |
105
106 Label used for Missing values:
107 | Missing value coding character
108 |
109
110 Variable in line or column:
111 | Indicate if variables are in lin or in columns
112 |
113
114
115 ------------
116 Output files
117 ------------
118
119
120 Transfo-<method>_<input file name>
121 | input file normalized according to the choosen method
122 |
123
124 Normalization_log
125 |
126
127 -------
128 Advices
129 -------
130
131 Nature of data may change
132 | Depending on the subjects of the experimentation and/or the technology used to measure a signal on these subjects.
133 | By instance, when dealing with RNA-Seq data, expression intensity values are expressed as counts, while with microarray technology, it is expressed as fluorescence intensity.
134 |
135
136 Before to conduct any analysis on a table of data, it is important to:
137 | Identify the nature of data you are dealing with
138 | Check if this nature of data is adapted to the type of analysis you want to do
139
140 If your nature of data is not adapted to the analysis you plan to do, you should first transform your data in a scale of values which fits better requirement of your analysis.
141 This transformation process is named “normalization”.
142
143
144 ---------------------
145 Normalization Methods
146 ---------------------
147
148 In this Galaxy module, we propose several normalization methods, and we provide some guidelines to help user choose the accurate normalization method:
149
150 Log normalization
151 | -Objective: Binary logarithm provide homogeneity of variance even if the range of values is pretty large
152 | -Accepted: values Any positive or null real numbers
153 | (null values, will stay null after transformation)
154 | -Range of values: Input: [0;100.000] / Output: [0;17]
155 | -Adapted for: PCA, HC, SS*
156 |
157
158 DESeq2 normalization
159 | -Objective: Obtain comparable counts between samples, whatever the difference of their libraries sequencing depth
160 | -Accepted values: NGS counts (positive integers ; no missing values)
161 | (null values, will stay null after transformation)
162 | -Range of values: Input: [0;100.000] / Output: [0; 100.000]
163 | -Adapted for: Differential analysis
164 |
165
166 RLog normalization
167 | -Objective: Similar to a combination of {DESeq2 + Log} transformation
168 | -Accepted values: NGS counts (positive integers ; no missing values)
169 | -Range of values: Input: [0;100.000] / Output: [0; 20]
170 | -Adapted for: PCA, HC, SS
171 |
172
173 Standard score normalization
174 | -Objective: Transform values such as {mean=0 and standard deviation=1} for all variables.
175 | -Accepted values: No specific constraint
176 | -Range of values: No specific constraint
177 | -Adapted for: PCA, HC, SS
178 |
179
180 Pareto normalization
181 | -Objective: Transform values such as
182 | {mean=0 and variance equal to its standard deviation instead of unit variance} for all variables.
183 | -Accepted values: No specific constraint
184 | -Range of values: No specific constraint
185 | -Adapted for: metabolite intensity values before PCA, HC, SS
186 |
187
188 Total sum scaling normalization (TSS)
189 | -Objective: Normalizes count data by dividing variable read count by the total number of read counts in each individual sample
190 | -Accepted values: 16S rRNA amplicon sequencing
191 | -Range of values: Input: no specific constraint / Output: [0;1[
192 | -Adapted for: PCA, HC, SS
193 |
194
195 Total sum scaling+Log ratio normalization (TSS+CLR)
196 | -Objective: Transform values such as {mean=0 and standard deviation=1} for all variables.
197 | -Accepted values: 16S rRNA amplicon sequencing
198 | -Range of values: Input: no specific constraint / Output: [0;1[
199 | -Adapted for: PCA, HC, SS
200
201 (*)PCA: Principal Component Analysis / HC: Hierarchical Clustering / SS: Summary Statistics
202
203 ------
204
205 **Authors**: Luc Jouneau (luc.jouneau@inra.fr), Sarah Maman (sarah.maman@inra.fr) and Valentin Marcon (valentin.marcon@inra.fr)
206
207 Contact : support.sigenae@inra.fr
208
209 E-learning available : Not yet.
210
211 .. class:: infomark
212
213 -------------
214 Please cite :
215 -------------
216
217 - (Depending on the help provided you can cite us in acknowledgements, references or both.)
218
219 Acknowledgements
220 | We wish to thank SIGENAE group and the statistical CATI BIOS4Biol group : Luc Jouneau, Sarah Maman
221 | Re-packaging was provided by Valentin Marcon (INRA, Migale platform http://migale.jouy.inra.fr), as part of the IFB project 'Galaxy For Life Science' (http://www.france-bioinformatique.fr/fr)
222 |
223
224 References
225 | SIGENAE [http://www.sigenae.org/]
226 |
227
228 ]]></help>
229 </tool>