view maaslin-4450aa4ecc84/maaslin.xml @ 1:a87d5a5f2776

Uploaded the version running on the prod server
author george-weingart
date Sun, 08 Feb 2015 23:08:38 -0500
parents
children
line wrap: on
line source

<tool id="maaslin_run" name="MaAsLin" version="1.0.1">
<code file="maaslin_format_input_selector.py"/> 
<description></description>
<command interpreter="python">maaslin_wrapper.py 
--lastmeta $cls_x 
--input $inp_data
--output $out_file1
--alpha $alpha
--min_abd $min_abd
--min_samp $min_samp
--zip_file $zip_file
--tool_option1 $tool_option1
</command>

  <inputs>
	<param format="maaslin" name="inp_data" type="data" label="pcl file of metadata and microbial community measurements: Upload using Get Data-Upload file - Use File-Format = maaslin - Sample file below"/>
	<param name="cls_x" type="select" label="Last metadata row (Select 'Weight' for demo data set)"  multiple="False" size ="70"  dynamic_options="get_cols(inp_data,'0')"/>
	<param name="alpha" type="float" size="8" value="0.05" label="Maximum false discovery rate (significance threshold)"/>
	<param name="min_abd" type="float" size="8" value="0.0001" label="Minimum for feature relative abundance filtering"/>
	<param name="min_samp" type="float" size="8" value="0.01" label="Minimum for feature prevalence filtering"/>

	<param name="tool_option1" type="select" label="Type of output">
           <option value="1">Single File: Summary</option>
           <option value="2">Two Files: Complete zipped results + Summary</option>
	</param>
        </inputs>
    <outputs>
        <data format="tabular" name="out_file1"  />
        <data  name="zip_file"  format="zip">
            <filter>tool_option1 == "2"</filter>
        </data>
    </outputs>
 <requirements>
    <requirement type="set_environment">maaslin_SCRIPT_PATH</requirement>
  </requirements>
   <tests>
       <test>
             <param name="inp_data" value="maaslin_input"  ftype="maaslin"  />
             <param name="cls_x" value="9" />
             <param name="alpha" value="0.05"  />
             <param name="min_abd" value="0.0001"  />
             <param name="min_samp" value="0.01"  />
             <param name="tool_option1" value="1"  />
             <output name="out_file1" file="maaslin_output"  />
                <assert_contents>
                   <has_text text="Variable     Feature Value   Coefficient     N       N.not.0 P.value Q.value" />
                </assert_contents>
       </test>
   </tests>
  <help>

Feedback?  Not working?  Please contact us at Maaslin_google_group_ .


MaAsLin: Multivariate Analysis by Linear Models
-----------------------------------------------

MaAsLin is a multivariate statistical framework that finds associations between clinical metadata and microbial community abundance or function. The clinical metadata can be of any type continuous (for example age and weight), boolean (sex, stool/biopsy), or discrete/factor (cohort groupings and phenotypes). MaAsLin is best used in the case when you are associating many metadata with microbial measurements. When this is the case each metadatum can be a diffrent type. For example, you could include age, weight, sex, cohort and phenotype in the same input file to be analyzed in the same MaAsLin run. The microbial measurements are expected to be normalized before using MaAsLin and so are proportional data ranging from 0 to 1.0.

The results of a MaAsLin run are the association of a specific microbial community member with metadata. These associations are without the influence of the other metadata in the study. There are certain factors known that can influence the microbiome (for example diet, age, geography, fecal or biopsy sample origin). MaAsLin allows one to detect the effect of a metadata, possibly a phenotype, deconfounding the effects of diet, age, sample origin or any other metadata captured in the study!

.. image:: https://bytebucket.org/biobakery/galaxy_maaslin/wiki/Figure1-Overview.png         
    :height: 500   
    :width: 600 


*Maaslin Analysis Overview* MaAsLin performs boosted, additive general linear models between one group of data (metadata/the predictors) and another group (in our case microbial abundance/the response). Given that metagenomic data is sparse, the boosting is used to select metadata that show some potential to be associated with microbial abundances. Boosting of metadata and selection of a model occurs per otu. The metadata data that is selected for use by boosting is then used in a general linear model using metadata as predictors and otu arcsin-square root transformed abundance as the response.



For more information on the technical aspects to this algorithm please see the methodological evaluation of MaAsLin that compared it to multiviariate and univariate analyses. Please check back for paper citing.

Process:
--------
The first step consists of uploading your data using Galaxy's **Get Data - Upload File**

A sample file is located at: https://bytebucket.org/biobakery/maaslin/wiki/maaslin_demo_pcl.txt


**Important** 

Please make sure to choose   **File Format:  maaslin**

Required inputs
---------------

MaAsLin requires an input pcl file of metadata and microbial community measurements. MaAsLin expects a PCL file as an input file. A PCL file is a text delimited file similar to an excel spread sheet with the following characteristics.

1. **Rows** represent metadata and features (bugs), **columns** represent samples
2. The **first row** by default should be the sample ids.
3. Metadata rows should be next.
4. Lastly, rows containing features (bugs) measurements (like abundance) should be after metadata rows.
5. The **first column** should contain the ID describing the column. For metadata this may be, for example, ''Age'' for a row containing the age of the patients donating the samples. For measurements, this should be the feature name (bug name).
6. The file is expected to be TAB delimited.






Description of parameters
-------------------------
**Input file** Select a loaded data file to use in analysis.

**Last metadata row** Metadata and microbial measurements should be rows of the pcl file. Metadata should all come before microbial measurements. This row is the last metadata row which is only followed by rows which are microbial measurements.

**Maximum false discovery rate (Significance threshold)** Associations are found significant if thier q-value is equal to or less than this threshold.

**Minimum for feature relative abundance filtering** The minimum relative abundance allowed in the data. Values below this are removed and imputed as the median of the sample data.

**Minimum for feature prevalence filtering** The minimum percentage of samples a feature can have abudance in before being removed.

**Type of Output** Select one of the two options for output (summary or detailed results).

Outputs
-------

The Run MaAsLin module will create either A) a summary text file of plotted significant associations or B) a compressed directory of associations (significant and not significant).

A. Any association that had a q-value less than or equal to the significance threshold will be included in a tab-delimited file.

B. The following files will be generated per MaAsLin run. In the following listing the term projectname refers to what you named your pcl file without the extension.

**Analysis** (These files are useful for analysis):

**projectname-metadata.txt** Each metadata will have a file of associations. Any associations indicated to be performed after initial boosting is recorded here. Included are the information from the final general linear model (performed after the boosting) and the FDR corrected p-value (q-value). Can be opened as a text file or spreadsheet.

**projectname-metadata.pdf** Any association that had a q-value less than or equal to the significance threshold will be plotted here. If this file does not exist, the projectname-metadata.txt should not have an entry that is less than or equal to the threshold. Factor and boolean data is plotted as knotched box plots; continuous data is plotted as a scatter plot with a line of best fit.

.. image:: https://bytebucket.org/biobakery/galaxy_maaslin/wiki/Maaslin_Output.png      
    :height: 500        
    :width: 600   



*Example of the projectname-metadata.pdf file* Significant associations are combined in files of associations per metadata. Factor and boolean data is plotted as knotched box plots; continuous data is plotted as a scatter plot with a line of best fit. Plots show raw data, header data show information from the reduced 

**projectname_Summary.txt** Any entry in the projectname-metadata.pdf are collected together here. Can be opened as a text file or spreadsheet.

**Troubleshooting** (These files are typically not used for analysis but are there for documenting the process and troubleshooting):

**projectname.txt** Contains the detail for the statistical engine. Is useful for detailed troubleshooting.

**data.tsv** The data matrix that was read in (transposed). Useful for making sure the correct data was read in.

**data.read.config** Can be used to read in the data.tsv .

**metadata.tsv** The metadata that was read in (transposed). Useful for making sure the correct metadata was read in.

**metadata.read.config** Can be used to read in the data.tsv .

**read_merged.tsv** The data and metadata merged (transposed). Useful for making sure the merging occurred correctly.

**read_merged.read.config** Can be used to read in the read_merged.tsv .

**read_cleaned.tsv** The data read in, merged, and then cleaned. After this process the data is written to this file for reference if needed.

**read_cleaned.read.config** Can be used to read in read_cleaned.tsv .

**ProcessQC.txt** Contains quality control for the MaAsLin analysis. This includes information on the magnitude of outlier removal.

Contacts
--------

Please feel free to contact us at ttickle@hsph.harvard.edu  for any questions or comments!

.. _Maaslin_google_group: https://groups.google.com/d/forum/maaslin-users

 </help>
</tool>