view normalization.xml @ 0:79f00bc83ecc draft default tip

planemo upload commit a2411926bebc2ca3bb31215899a9f18a67e59556
author vmarcon
date Thu, 18 Jan 2018 06:20:30 -0500
parents
children
line wrap: on
line source

<!--# Copyright (C) 2017 INRA
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see http://www.gnu.org/licenses/.
#-->

<tool id="normalization" name="Normalization" version="1.0.0">
    <description>Normalize your data with some well known methods</description>
    <requirements>
        <requirement type="package">R</requirement>
        <requirement type="package">bioconductor-deseq2</requirement>
        <requirement type="package">r-batch</requirement>
    </requirements>
    <stdio>
        <!-- Anything other than zero is an error -->
        <exit_code range="1:" level="fatal"/>
        <exit_code range=":-1" level="fatal"/>
    </stdio>
    <command interpreter="Rscript"><![CDATA[
        normalization_galaxy.R
        input_file '${input_file}'
        transformation_method '${transformation_method}'
        na_encoding '${na_encoding}'
        output_file '${output_file}'
        log_file '${log_file}'
        variable_in_line '${variable_in_line}'
    ]]></command>
    <inputs>                  
        <param format="tabular,csv" name="input_file" type="data" label="Input file"/>
        <param name="transformation_method" type="select" label="Data transformation method" help="See the complete help below for more details"> 
            <option value="log">Log (binary logarithm)</option>
            <option value="DESeq2">DESeq2 for NGS counts</option>
            <option value="Rlog">RLog (as implemented in DESeq2)</option>
            <option value="Standard_score">Standard score (mean=0;sd=1) </option>
            <option value="Pareto">Pareto (mean=0;sd moderate)</option>
            <option value="TSS">Total sum scaling (TSS)</option>
            <option value="TSS_CLR">Total sum scaling + log ratio (TSS+CLR)</option>
            <validator type="empty_field" message="Please choose, at least, one data transformation method." />
        </param>                     
        <param name="na_encoding" size="30" type="text" value="NA" label="Label used for Missing values"/> 
        <param name="variable_in_line" type="select" multiple="false" display="radio" label="Variable in line or column?">
            <option value="1">Line</option>
            <option value="0">Column</option>
        </param>
    </inputs>
    <outputs>
        <data name="log_file" format="html" label="Normalization_log"/>
        <data name="output_file" format_source="input_file" label="Transfo-${transformation_method.value}_${input_file.name}"/>
    </outputs>
    <tests>
        <test>
            <param name="input_file" value="decathlon.tsv"/>
            <param name="transformation_method" value="log"/>
            <param name="na_encoding" value="NA"/>
            <param name="variable_in_line" value="0"/>
            <output name="log_file" file="log_file"/>
            <output name="output_file" file="output_file"/>
        </test>
    </tests>
    <help><![CDATA[

=========
Normalize
=========

-----------
Description
-----------

 - This tool is part of a set of statistical tools made by members of the BIOS4BIOL group ("Normalization", "Summary statistics", "Hierarchical clustering" and "PCAFactoMineR").
 - Please use this Normalization module before using other modules of the suite.

What it does: 
 - It normalize your data with some well known methods

------

-----------
Input files
-----------

+---------------------------+------------+
| Parameter : num + label   |   Format   |
+===========================+============+
| 1 : input file            |   tabular  |
+---------------------------+------------+


----------
Parameters
----------

Data transformation method
        | Possible values: "log", "DESeq2", "Rlog", "Standard_score", "TSS", "TSS_CLR"
        | 

Label used for Missing values: 
        | Missing value coding character 
        | 

Variable in line or column: 
        | Indicate if variables are in lin or in columns
        | 


------------
Output files
------------


Transfo-<method>_<input file name>
        | input file normalized according to the choosen method
        | 

Normalization_log
        |

-------
Advices
-------

Nature of data may change
        | Depending on the subjects of the experimentation and/or the technology used to measure a signal on these subjects.
        | By instance, when dealing with RNA-Seq data, expression intensity values are expressed as counts, while with microarray technology, it is expressed as fluorescence intensity.
        |  

Before to conduct any analysis on a table of data, it is important to:
        | Identify the nature of data you are dealing with
        | Check if this nature of data is adapted to the type of analysis you want to do

If your nature of data is not adapted to the analysis you plan to do, you should first transform your data in a scale of values which fits better requirement of your analysis.
This transformation process is named “normalization”.


---------------------
Normalization Methods
---------------------

In this Galaxy module, we propose several normalization methods, and we provide some guidelines to help user choose the accurate normalization method:

Log normalization
        |  -Objective: Binary logarithm provide homogeneity of variance even if the range of values is pretty large
        |  -Accepted: values   Any positive or null real numbers
        |  (null values, will stay null after transformation)
        |  -Range of values:   Input: [0;100.000] / Output: [0;17]
        |  -Adapted for:       PCA, HC, SS*
        | 

DESeq2 normalization
        |  -Objective: Obtain comparable counts between samples, whatever the difference of their libraries sequencing depth
        |  -Accepted values:   NGS counts (positive integers ; no missing values)
        |  (null values, will stay null after transformation)
        |  -Range of values:   Input: [0;100.000] / Output: [0; 100.000]
        |  -Adapted for:       Differential analysis
        | 

RLog normalization
        |  -Objective: Similar to a combination of {DESeq2 + Log} transformation
        |  -Accepted values:   NGS counts (positive integers ; no missing values)
        |  -Range of values:   Input: [0;100.000] / Output: [0; 20]
        |  -Adapted for:       PCA, HC, SS
        | 

Standard score normalization
        |  -Objective: Transform values such as {mean=0 and standard deviation=1} for all variables.
        |  -Accepted values:   No specific constraint
        |  -Range of values:   No specific constraint
        |  -Adapted for:       PCA, HC, SS
        | 

Pareto normalization
        |  -Objective: Transform values such as
        |  {mean=0 and variance equal to its standard deviation instead of unit variance} for all variables.
        |  -Accepted values: No specific constraint
        |  -Range of values: No specific constraint
        |  -Adapted for: metabolite intensity values before PCA, HC, SS
        | 

Total sum scaling normalization (TSS)
        |  -Objective: Normalizes count data by dividing variable read count by the total number of read counts in each individual sample
        |  -Accepted values:   16S rRNA amplicon sequencing
        |  -Range of values:   Input: no specific constraint / Output: [0;1[
        |  -Adapted for:       PCA, HC, SS
        | 

Total sum scaling+Log ratio normalization (TSS+CLR)
        |  -Objective: Transform values such as {mean=0 and standard deviation=1} for all variables.
        |  -Accepted values:   16S rRNA amplicon sequencing
        |  -Range of values:   Input: no specific constraint / Output: [0;1[
        |  -Adapted for:       PCA, HC, SS

(*)PCA: Principal Component Analysis / HC: Hierarchical Clustering / SS: Summary Statistics

------

**Authors**: Luc Jouneau (luc.jouneau@inra.fr), Sarah Maman (sarah.maman@inra.fr) and Valentin Marcon (valentin.marcon@inra.fr) 

Contact : support.sigenae@inra.fr

E-learning available : Not yet.

.. class:: infomark

-------------
Please cite :
-------------

- (Depending on the help provided you can cite us in acknowledgements, references or both.)
    
Acknowledgements
        | We wish to thank SIGENAE group and the statistical CATI BIOS4Biol group : Luc Jouneau, Sarah Maman
        | Re-packaging was provided by Valentin Marcon (INRA, Migale platform http://migale.jouy.inra.fr), as part of the IFB project 'Galaxy For Life Science' (http://www.france-bioinformatique.fr/fr)
        |  
   
References
        | SIGENAE [http://www.sigenae.org/]
        |

    ]]></help>
</tool>