annotate doc/Merge_Metadata_Read_Me.txt @ 0:e0b5980139d9

maaslin
author george-weingart
date Tue, 13 May 2014 22:00:40 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
1 I. Quick start.
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
2
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
3 The merge_metadata.py script has been included in the MaAsLin package to help add metadata to otu tables (or any tab delimited file where columns are the samples). This script was used to make the maaslin_demo.pcl file found in this project.
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
4
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
5 The generic command to run the merge_metadata.py is:
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
6 python merge_metadata.py input_metadata_file < input_measurements_file > output_pcl_file
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
7
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
8 An example of the expected files are found in this project in the directory maaslin/input/for_merge_metadata
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
9 An example of how to run the command on the example files is as follows (when in the maaslin folder in a terminal):
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
10 python src/merge_metadata.py input/for_merge_metadata/maaslin_demo_metadata.metadata < input/for_merge_metadata/maaslin_demo_measurements.pcl > input/maaslin_demo.pcl
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
11
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
12 II. Script overview
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
13 merge_metadata.py takes a tab delimited metadata file and adds it to a otu table. Both files have expected formats given below. Additionally, if a pipe-delimited consensus lineage is given in the IDs of the OTUs (for instance for the genus Bifidobacterium, Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales|Bifidobacteriaceae|Bifidobacterium), the higher level clades in the consensus lineage are added to other otu in the same clade level generating all higher level clade information captured in the otu data*. This heirarchy is then normalized using the same heirarchical structure. This means, after using the script, a sample will sum to more than 1, typically somewhere around 6 but will depend on if your data is originally at genus, species, or another level of resolution. All terminal otus (or the original otus) in a sample should sum to 1.
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
14
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
15 *To help combat multiple comparisons, additional clades are only added if they add information to the data set. This means if you have an otu Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales|Bifidobacteriaceae|Bifidobacterium and no other related otus until Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales, Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales|Bifidobacteriaceae will not be added to the data set because it will be no different than the already existing and more specific Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales|Bifidobacteriaceae|Bifidobacterium otu. Clades at and above Bacteria|Actinobacteria|Actinobacteria|Bifidobacteriales will be included depending on if there are other otus to add them to at those clade levels.
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
16
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
17
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
18 III. Description of input files
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
19
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
20 Metadata file:
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
21 Please make the file as follows:
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
22 1. Tab delimited
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
23 2. Rows are samples, columns are metadata
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
24 3. Sample Ids in the metadata file should match the sample ids in the otu table.
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
25 4. Use NA for values which are not recorded.
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
26 5. An example file is found at input/for_merge_metadata/maaslin_demo_metadata.metadata
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
27
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
28 OTU table:
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
29 Please make the file as follows:
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
30 1. Tab delimited.
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
31 2. Rows are otus, columns are samples (note this is transposed in comparison to the metadata file).
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
32 3. If a consensus lineage is included in the otu name, use pipes as the delimiter.
e0b5980139d9 maaslin
george-weingart
parents:
diff changeset
33 4. An example file is found at input/for_merge_metadata/maaslin_demo_measurements.pcl