microPITA is a computational tool enabling sample selection in tiered studies. Using tiered-study designs can more efficiently allocate resources, reducing study costs, and maximizing the use of samples. From a survey study, selection of samples can be performed to target various microbial communities including:
Additionally, methods can leverage clinical metadata by stratifying samples into groups in which samples are subsequently selected. This enables the use of microPITA in cohort studies.
MicroPITA unsupervised method selection in the HMP 16S Gut Microbiome. Selection of 10 samples using targeted feature targeting Bacteroides (blue), maximum diversity (orange), representative dissimilarity (purple), and most dissimilar (pink) using Principle Covariance Analysis (PCoA) for ordination. Targeted feature selects samples dominated by Bacteroides (upper left) while maximum diversity select more diverse samples away from Bacteroides dominant samples. Representative selection selects samples covering the range of samples in the PCoA plot focusing on the higher density central region while maximum dissimilarity selects samples at the periphery of the plot.
Before running microPita, you must upload your data using Glaxay's Get Data - Upload File Please make sure that you choose File Format Micropita An example can be found at https://bytebucket.org/biobakery/micropita/wiki/micropita_sample_PCL.txt
microPITA requires an input pcl file of metadata and microbial community measurements. Although some defaults can be changed, microPITA expects a PCL file as an input file. A PCL file is a text delimited file similar to an excel spread sheet with the following characteristics.
Note MAC users, please save file as windows formatted text.
The Run MicroPITA module will create one output text file. The output will consist of one line starting with a key word for the selection method and then followed by selected samples delimited by tabs. An example of 6 samples selected by the representative:
representative sample_1 sample_2 sample_3 sample_4 sample_5 sample_6
A brief description of the Run micropita module.
Input file: This should be populated by the Load microPITA module.
Last metadata row: The row on the input pcl file that is the last metadata. All microbial measurements should follow this row.
Select method: Select which method to use for sample selection. Selection methods include:
Targeted feature(s): (visible with Features method selection only) Select 1 or more features to target in sample selection.
Selection type: (visible with Features method selection only) Rank or Abundance.
Label: (visible with supervised method selection only) The row which contains the label used to classify the samples from supervised methods.
Stratify by (optional): The row which contains the groupings the samples will first be placed in before running the selection method on each group. If no grouping is selected, selection methods will be performed on the data set as a whole.
Number of samples to select: The number of samples to select. If samples are stratified, this is per stratification (or group). If supervised methods are used, this is the number of samples selected per classification group (as defined by the label).
For more information please visit http://huttenhower.sph.harvard.edu/micropita
Special thanks to Eric Franzosa for developing the above PCL figure!
For more information please visit http://huttenhower.sph.harvard.edu/micropita When using MicroPITA please cite: Tickle T, Segata N, Waldron L, Weingart G, Huttenhower C. Two-stage microbial community experimental design. (Under review)
Please feel free to contact us at ttickle@hsph.harvard.edu for any questions or comments!