annotate maaslin-4450aa4ecc84/doc/Merge_Metadata_Read_Me.txt @ 1:a87d5a5f2776

Uploaded the version running on the prod server
author george-weingart
date Sun, 08 Feb 2015 23:08:38 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
1 I. Quick start.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
2
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
3 The merge_metadata.py script has been included in the MaAsLin package to help add metadata to otu tables (or any tab delimited file where columns are the samples). This script was used to make the maaslin_demo.pcl file found in this project.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
4
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
5 The generic command to run the merge_metadata.py is:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
6 python merge_metadata.py input_metadata_file < input_measurements_file > output_pcl_file
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
7
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
8 An example of the expected files are found in this project in the directory maaslin/input/for_merge_metadata
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
9 An example of how to run the command on the example files is as follows (when in the maaslin folder in a terminal):
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
10 python src/merge_metadata.py input/for_merge_metadata/maaslin_demo_metadata.metadata < input/for_merge_metadata/maaslin_demo_measurements.pcl > input/maaslin_demo.pcl
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
11
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
12 II. Script overview
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
13 merge_metadata.py takes a tab delimited metadata file and adds it to a otu table. Both files have expected formats given below. Additionally, if a pipe-delimited consensus lineage is given in the IDs of the OTUs (for instance for the genus Bifidobacterium, Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales|Bifidobacteriaceae|Bifidobacterium), the higher level clades in the consensus lineage are added to other otu in the same clade level generating all higher level clade information captured in the otu data*. This heirarchy is then normalized using the same heirarchical structure. This means, after using the script, a sample will sum to more than 1, typically somewhere around 6 but will depend on if your data is originally at genus, species, or another level of resolution. All terminal otus (or the original otus) in a sample should sum to 1.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
14
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
15 *To help combat multiple comparisons, additional clades are only added if they add information to the data set. This means if you have an otu Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales|Bifidobacteriaceae|Bifidobacterium and no other related otus until Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales, Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales|Bifidobacteriaceae will not be added to the data set because it will be no different than the already existing and more specific Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales|Bifidobacteriaceae|Bifidobacterium otu. Clades at and above Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales will be included depending on if there are other otus to add them to at those clade levels.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
16
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
17
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
18 III. Description of input files
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
19
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
20 Metadata file:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
21 Please make the file as follows:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
22 1. Tab delimited
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
23 2. Rows are samples, columns are metadata
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
24 3. Sample Ids in the metadata file should match the sample ids in the otu table.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
25 4. Use NA for values which are not recorded.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
26 5. An example file is found at input/for_merge_metadata/maaslin_demo_metadata.metadata
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
27
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
28 OTU table:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
29 Please make the file as follows:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
30 1. Tab delimited.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
31 2. Rows are otus, columns are samples (note this is transposed in comparison to the metadata file).
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
32 3. If a consensus lineage is included in the otu name, use pipes as the delimiter.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
33 4. An example file is found at input/for_merge_metadata/maaslin_demo_measurements.pcl