Mercurial > repos > vmarcon > normalization
comparison normalization.xml @ 0:79f00bc83ecc draft default tip
planemo upload commit a2411926bebc2ca3bb31215899a9f18a67e59556
author | vmarcon |
---|---|
date | Thu, 18 Jan 2018 06:20:30 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:79f00bc83ecc |
---|---|
1 <!--# Copyright (C) 2017 INRA | |
2 # This program is free software: you can redistribute it and/or modify | |
3 # it under the terms of the GNU General Public License as published by | |
4 # the Free Software Foundation, either version 3 of the License, or | |
5 # (at your option) any later version. | |
6 # | |
7 # This program is distributed in the hope that it will be useful, | |
8 # but WITHOUT ANY WARRANTY; without even the implied warranty of | |
9 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
10 # GNU General Public License for more details. | |
11 # | |
12 # You should have received a copy of the GNU General Public License | |
13 # along with this program. If not, see http://www.gnu.org/licenses/. | |
14 #--> | |
15 | |
16 <tool id="normalization" name="Normalization" version="1.0.0"> | |
17 <description>Normalize your data with some well known methods</description> | |
18 <requirements> | |
19 <requirement type="package">R</requirement> | |
20 <requirement type="package">bioconductor-deseq2</requirement> | |
21 <requirement type="package">r-batch</requirement> | |
22 </requirements> | |
23 <stdio> | |
24 <!-- Anything other than zero is an error --> | |
25 <exit_code range="1:" level="fatal"/> | |
26 <exit_code range=":-1" level="fatal"/> | |
27 </stdio> | |
28 <command interpreter="Rscript"><![CDATA[ | |
29 normalization_galaxy.R | |
30 input_file '${input_file}' | |
31 transformation_method '${transformation_method}' | |
32 na_encoding '${na_encoding}' | |
33 output_file '${output_file}' | |
34 log_file '${log_file}' | |
35 variable_in_line '${variable_in_line}' | |
36 ]]></command> | |
37 <inputs> | |
38 <param format="tabular,csv" name="input_file" type="data" label="Input file"/> | |
39 <param name="transformation_method" type="select" label="Data transformation method" help="See the complete help below for more details"> | |
40 <option value="log">Log (binary logarithm)</option> | |
41 <option value="DESeq2">DESeq2 for NGS counts</option> | |
42 <option value="Rlog">RLog (as implemented in DESeq2)</option> | |
43 <option value="Standard_score">Standard score (mean=0;sd=1) </option> | |
44 <option value="Pareto">Pareto (mean=0;sd moderate)</option> | |
45 <option value="TSS">Total sum scaling (TSS)</option> | |
46 <option value="TSS_CLR">Total sum scaling + log ratio (TSS+CLR)</option> | |
47 <validator type="empty_field" message="Please choose, at least, one data transformation method." /> | |
48 </param> | |
49 <param name="na_encoding" size="30" type="text" value="NA" label="Label used for Missing values"/> | |
50 <param name="variable_in_line" type="select" multiple="false" display="radio" label="Variable in line or column?"> | |
51 <option value="1">Line</option> | |
52 <option value="0">Column</option> | |
53 </param> | |
54 </inputs> | |
55 <outputs> | |
56 <data name="log_file" format="html" label="Normalization_log"/> | |
57 <data name="output_file" format_source="input_file" label="Transfo-${transformation_method.value}_${input_file.name}"/> | |
58 </outputs> | |
59 <tests> | |
60 <test> | |
61 <param name="input_file" value="decathlon.tsv"/> | |
62 <param name="transformation_method" value="log"/> | |
63 <param name="na_encoding" value="NA"/> | |
64 <param name="variable_in_line" value="0"/> | |
65 <output name="log_file" file="log_file"/> | |
66 <output name="output_file" file="output_file"/> | |
67 </test> | |
68 </tests> | |
69 <help><![CDATA[ | |
70 | |
71 ========= | |
72 Normalize | |
73 ========= | |
74 | |
75 ----------- | |
76 Description | |
77 ----------- | |
78 | |
79 - This tool is part of a set of statistical tools made by members of the BIOS4BIOL group ("Normalization", "Summary statistics", "Hierarchical clustering" and "PCAFactoMineR"). | |
80 - Please use this Normalization module before using other modules of the suite. | |
81 | |
82 What it does: | |
83 - It normalize your data with some well known methods | |
84 | |
85 ------ | |
86 | |
87 ----------- | |
88 Input files | |
89 ----------- | |
90 | |
91 +---------------------------+------------+ | |
92 | Parameter : num + label | Format | | |
93 +===========================+============+ | |
94 | 1 : input file | tabular | | |
95 +---------------------------+------------+ | |
96 | |
97 | |
98 ---------- | |
99 Parameters | |
100 ---------- | |
101 | |
102 Data transformation method | |
103 | Possible values: "log", "DESeq2", "Rlog", "Standard_score", "TSS", "TSS_CLR" | |
104 | | |
105 | |
106 Label used for Missing values: | |
107 | Missing value coding character | |
108 | | |
109 | |
110 Variable in line or column: | |
111 | Indicate if variables are in lin or in columns | |
112 | | |
113 | |
114 | |
115 ------------ | |
116 Output files | |
117 ------------ | |
118 | |
119 | |
120 Transfo-<method>_<input file name> | |
121 | input file normalized according to the choosen method | |
122 | | |
123 | |
124 Normalization_log | |
125 | | |
126 | |
127 ------- | |
128 Advices | |
129 ------- | |
130 | |
131 Nature of data may change | |
132 | Depending on the subjects of the experimentation and/or the technology used to measure a signal on these subjects. | |
133 | By instance, when dealing with RNA-Seq data, expression intensity values are expressed as counts, while with microarray technology, it is expressed as fluorescence intensity. | |
134 | | |
135 | |
136 Before to conduct any analysis on a table of data, it is important to: | |
137 | Identify the nature of data you are dealing with | |
138 | Check if this nature of data is adapted to the type of analysis you want to do | |
139 | |
140 If your nature of data is not adapted to the analysis you plan to do, you should first transform your data in a scale of values which fits better requirement of your analysis. | |
141 This transformation process is named “normalization”. | |
142 | |
143 | |
144 --------------------- | |
145 Normalization Methods | |
146 --------------------- | |
147 | |
148 In this Galaxy module, we propose several normalization methods, and we provide some guidelines to help user choose the accurate normalization method: | |
149 | |
150 Log normalization | |
151 | -Objective: Binary logarithm provide homogeneity of variance even if the range of values is pretty large | |
152 | -Accepted: values Any positive or null real numbers | |
153 | (null values, will stay null after transformation) | |
154 | -Range of values: Input: [0;100.000] / Output: [0;17] | |
155 | -Adapted for: PCA, HC, SS* | |
156 | | |
157 | |
158 DESeq2 normalization | |
159 | -Objective: Obtain comparable counts between samples, whatever the difference of their libraries sequencing depth | |
160 | -Accepted values: NGS counts (positive integers ; no missing values) | |
161 | (null values, will stay null after transformation) | |
162 | -Range of values: Input: [0;100.000] / Output: [0; 100.000] | |
163 | -Adapted for: Differential analysis | |
164 | | |
165 | |
166 RLog normalization | |
167 | -Objective: Similar to a combination of {DESeq2 + Log} transformation | |
168 | -Accepted values: NGS counts (positive integers ; no missing values) | |
169 | -Range of values: Input: [0;100.000] / Output: [0; 20] | |
170 | -Adapted for: PCA, HC, SS | |
171 | | |
172 | |
173 Standard score normalization | |
174 | -Objective: Transform values such as {mean=0 and standard deviation=1} for all variables. | |
175 | -Accepted values: No specific constraint | |
176 | -Range of values: No specific constraint | |
177 | -Adapted for: PCA, HC, SS | |
178 | | |
179 | |
180 Pareto normalization | |
181 | -Objective: Transform values such as | |
182 | {mean=0 and variance equal to its standard deviation instead of unit variance} for all variables. | |
183 | -Accepted values: No specific constraint | |
184 | -Range of values: No specific constraint | |
185 | -Adapted for: metabolite intensity values before PCA, HC, SS | |
186 | | |
187 | |
188 Total sum scaling normalization (TSS) | |
189 | -Objective: Normalizes count data by dividing variable read count by the total number of read counts in each individual sample | |
190 | -Accepted values: 16S rRNA amplicon sequencing | |
191 | -Range of values: Input: no specific constraint / Output: [0;1[ | |
192 | -Adapted for: PCA, HC, SS | |
193 | | |
194 | |
195 Total sum scaling+Log ratio normalization (TSS+CLR) | |
196 | -Objective: Transform values such as {mean=0 and standard deviation=1} for all variables. | |
197 | -Accepted values: 16S rRNA amplicon sequencing | |
198 | -Range of values: Input: no specific constraint / Output: [0;1[ | |
199 | -Adapted for: PCA, HC, SS | |
200 | |
201 (*)PCA: Principal Component Analysis / HC: Hierarchical Clustering / SS: Summary Statistics | |
202 | |
203 ------ | |
204 | |
205 **Authors**: Luc Jouneau (luc.jouneau@inra.fr), Sarah Maman (sarah.maman@inra.fr) and Valentin Marcon (valentin.marcon@inra.fr) | |
206 | |
207 Contact : support.sigenae@inra.fr | |
208 | |
209 E-learning available : Not yet. | |
210 | |
211 .. class:: infomark | |
212 | |
213 ------------- | |
214 Please cite : | |
215 ------------- | |
216 | |
217 - (Depending on the help provided you can cite us in acknowledgements, references or both.) | |
218 | |
219 Acknowledgements | |
220 | We wish to thank SIGENAE group and the statistical CATI BIOS4Biol group : Luc Jouneau, Sarah Maman | |
221 | Re-packaging was provided by Valentin Marcon (INRA, Migale platform http://migale.jouy.inra.fr), as part of the IFB project 'Galaxy For Life Science' (http://www.france-bioinformatique.fr/fr) | |
222 | | |
223 | |
224 References | |
225 | SIGENAE [http://www.sigenae.org/] | |
226 | | |
227 | |
228 ]]></help> | |
229 </tool> |