comparison maaslin.xml @ 8:e9677425c6c3 default tip

Updated the structure of the libraries
author george.weingart@gmail.com
date Mon, 09 Feb 2015 12:17:40 -0500
parents
children
comparison
equal deleted inserted replaced
7:c72e14eabb08 8:e9677425c6c3
1 <tool id="maaslin_run" name="MaAsLin" version="1.0.1">
2 <code file="maaslin_format_input_selector.py"/>
3 <description></description>
4 <command interpreter="python">maaslin_wrapper.py
5 --lastmeta $cls_x
6 --input $inp_data
7 --output $out_file1
8 --alpha $alpha
9 --min_abd $min_abd
10 --min_samp $min_samp
11 --zip_file $zip_file
12 --tool_option1 $tool_option1
13 </command>
14
15 <inputs>
16 <param format="maaslin" name="inp_data" type="data" label="pcl file of metadata and microbial community measurements: Upload using Get Data-Upload file - Use File-Format = maaslin - Sample file below"/>
17 <param name="cls_x" type="select" label="Last metadata row (Select 'Weight' for demo data set)" multiple="False" size ="70" dynamic_options="get_cols(inp_data,'0')"/>
18 <param name="alpha" type="float" size="8" value="0.05" label="Maximum false discovery rate (significance threshold)"/>
19 <param name="min_abd" type="float" size="8" value="0.0001" label="Minimum for feature relative abundance filtering"/>
20 <param name="min_samp" type="float" size="8" value="0.01" label="Minimum for feature prevalence filtering"/>
21
22 <param name="tool_option1" type="select" label="Type of output">
23 <option value="1">Single File: Summary</option>
24 <option value="2">Two Files: Complete zipped results + Summary</option>
25 </param>
26 </inputs>
27 <outputs>
28 <data format="tabular" name="out_file1" />
29 <data name="zip_file" format="zip">
30 <filter>tool_option1 == "2"</filter>
31 </data>
32 </outputs>
33 <requirements>
34 <requirement type="set_environment">maaslin_SCRIPT_PATH</requirement>
35 </requirements>
36 <tests>
37 <test>
38 <param name="inp_data" value="maaslin_input" ftype="maaslin" />
39 <param name="cls_x" value="9" />
40 <param name="alpha" value="0.05" />
41 <param name="min_abd" value="0.0001" />
42 <param name="min_samp" value="0.01" />
43 <param name="tool_option1" value="1" />
44 <output name="out_file1" file="maaslin_output" />
45 <assert_contents>
46 <has_text text="Variable Feature Value Coefficient N N.not.0 P.value Q.value" />
47 </assert_contents>
48 </test>
49 </tests>
50 <help>
51
52 Feedback? Not working? Please contact us at Maaslin_google_group_ .
53
54
55 MaAsLin: Multivariate Analysis by Linear Models
56 -----------------------------------------------
57
58 MaAsLin is a multivariate statistical framework that finds associations between clinical metadata and microbial community abundance or function. The clinical metadata can be of any type continuous (for example age and weight), boolean (sex, stool/biopsy), or discrete/factor (cohort groupings and phenotypes). MaAsLin is best used in the case when you are associating many metadata with microbial measurements. When this is the case each metadatum can be a diffrent type. For example, you could include age, weight, sex, cohort and phenotype in the same input file to be analyzed in the same MaAsLin run. The microbial measurements are expected to be normalized before using MaAsLin and so are proportional data ranging from 0 to 1.0.
59
60 The results of a MaAsLin run are the association of a specific microbial community member with metadata. These associations are without the influence of the other metadata in the study. There are certain factors known that can influence the microbiome (for example diet, age, geography, fecal or biopsy sample origin). MaAsLin allows one to detect the effect of a metadata, possibly a phenotype, deconfounding the effects of diet, age, sample origin or any other metadata captured in the study!
61
62 .. image:: https://bytebucket.org/biobakery/galaxy_maaslin/wiki/Figure1-Overview.png
63 :height: 500
64 :width: 600
65
66
67 *Maaslin Analysis Overview* MaAsLin performs boosted, additive general linear models between one group of data (metadata/the predictors) and another group (in our case microbial abundance/the response). Given that metagenomic data is sparse, the boosting is used to select metadata that show some potential to be associated with microbial abundances. Boosting of metadata and selection of a model occurs per otu. The metadata data that is selected for use by boosting is then used in a general linear model using metadata as predictors and otu arcsin-square root transformed abundance as the response.
68
69
70
71 For more information on the technical aspects to this algorithm please see the methodological evaluation of MaAsLin that compared it to multiviariate and univariate analyses. Please check back for paper citing.
72
73 Process:
74 --------
75 The first step consists of uploading your data using Galaxy's **Get Data - Upload File**
76
77 A sample file is located at: https://bytebucket.org/biobakery/maaslin/wiki/maaslin_demo_pcl.txt
78
79
80 **Important**
81
82 Please make sure to choose **File Format: maaslin**
83
84 Required inputs
85 ---------------
86
87 MaAsLin requires an input pcl file of metadata and microbial community measurements. MaAsLin expects a PCL file as an input file. A PCL file is a text delimited file similar to an excel spread sheet with the following characteristics.
88
89 1. **Rows** represent metadata and features (bugs), **columns** represent samples
90 2. The **first row** by default should be the sample ids.
91 3. Metadata rows should be next.
92 4. Lastly, rows containing features (bugs) measurements (like abundance) should be after metadata rows.
93 5. The **first column** should contain the ID describing the column. For metadata this may be, for example, ''Age'' for a row containing the age of the patients donating the samples. For measurements, this should be the feature name (bug name).
94 6. The file is expected to be TAB delimited.
95
96
97
98
99
100
101 Description of parameters
102 -------------------------
103 **Input file** Select a loaded data file to use in analysis.
104
105 **Last metadata row** Metadata and microbial measurements should be rows of the pcl file. Metadata should all come before microbial measurements. This row is the last metadata row which is only followed by rows which are microbial measurements.
106
107 **Maximum false discovery rate (Significance threshold)** Associations are found significant if thier q-value is equal to or less than this threshold.
108
109 **Minimum for feature relative abundance filtering** The minimum relative abundance allowed in the data. Values below this are removed and imputed as the median of the sample data.
110
111 **Minimum for feature prevalence filtering** The minimum percentage of samples a feature can have abudance in before being removed.
112
113 **Type of Output** Select one of the two options for output (summary or detailed results).
114
115 Outputs
116 -------
117
118 The Run MaAsLin module will create either A) a summary text file of plotted significant associations or B) a compressed directory of associations (significant and not significant).
119
120 A. Any association that had a q-value less than or equal to the significance threshold will be included in a tab-delimited file.
121
122 B. The following files will be generated per MaAsLin run. In the following listing the term projectname refers to what you named your pcl file without the extension.
123
124 **Analysis** (These files are useful for analysis):
125
126 **projectname-metadata.txt** Each metadata will have a file of associations. Any associations indicated to be performed after initial boosting is recorded here. Included are the information from the final general linear model (performed after the boosting) and the FDR corrected p-value (q-value). Can be opened as a text file or spreadsheet.
127
128 **projectname-metadata.pdf** Any association that had a q-value less than or equal to the significance threshold will be plotted here. If this file does not exist, the projectname-metadata.txt should not have an entry that is less than or equal to the threshold. Factor and boolean data is plotted as knotched box plots; continuous data is plotted as a scatter plot with a line of best fit.
129
130 .. image:: https://bytebucket.org/biobakery/galaxy_maaslin/wiki/Maaslin_Output.png
131 :height: 500
132 :width: 600
133
134
135
136 *Example of the projectname-metadata.pdf file* Significant associations are combined in files of associations per metadata. Factor and boolean data is plotted as knotched box plots; continuous data is plotted as a scatter plot with a line of best fit. Plots show raw data, header data show information from the reduced
137
138 **projectname_Summary.txt** Any entry in the projectname-metadata.pdf are collected together here. Can be opened as a text file or spreadsheet.
139
140 **Troubleshooting** (These files are typically not used for analysis but are there for documenting the process and troubleshooting):
141
142 **projectname.txt** Contains the detail for the statistical engine. Is useful for detailed troubleshooting.
143
144 **data.tsv** The data matrix that was read in (transposed). Useful for making sure the correct data was read in.
145
146 **data.read.config** Can be used to read in the data.tsv .
147
148 **metadata.tsv** The metadata that was read in (transposed). Useful for making sure the correct metadata was read in.
149
150 **metadata.read.config** Can be used to read in the data.tsv .
151
152 **read_merged.tsv** The data and metadata merged (transposed). Useful for making sure the merging occurred correctly.
153
154 **read_merged.read.config** Can be used to read in the read_merged.tsv .
155
156 **read_cleaned.tsv** The data read in, merged, and then cleaned. After this process the data is written to this file for reference if needed.
157
158 **read_cleaned.read.config** Can be used to read in read_cleaned.tsv .
159
160 **ProcessQC.txt** Contains quality control for the MaAsLin analysis. This includes information on the magnitude of outlier removal.
161
162 Contacts
163 --------
164
165 Please feel free to contact us at subraman@broadinstitute.org for any questions or comments!
166
167 .. _Maaslin_google_group: https://groups.google.com/d/forum/maaslin-users
168
169 </help>
170 </tool>