HUMAnN is a pipeline for efficiently and accuretly profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data.
Read more about the tool: http://huttenhower.sph.harvard.edu/humann
This tool corresponds to the main tool in HUMAnN pipeline:
Taxomonic prescreen
Reads are mapped (with MetaPhlAn) to clade-specific marker genes to rapidly identify community species
Pangenome search (nucleotide search)
Reads are mapped (with Bowtie2) to pangenomes of identified species
Translated search
Unclassified reads are aligned to a comprehensive and non-redundant protein database
Gene family and pathway quantification
Gene abundance estimation
Mapping results are processed to estimate per-species and community total gene family abundance, weighting by
- alignment Quality
- gene length
- gene coverage
Per-species and community-level metabolic network reconstruction
Genes are mapped to metabolic reactions to identify a parsiomonious set of pathways that explains each species' observed reactions
Pathway abundance and coverage are quantified by:
- optimizing over alternative subpathways
- imputing abundance for conspicuously depleted reactions
HUMAnN can start from a few different types of input data each in a few different types of formats:
Quality-controlled shotgun sequencing reads
This is the most common starting point : A metagenome (DNA reads) or metatranscriptome (RNA reads)
Pre-computed mappings of reads to database sequences
Pre-computed (typically gene) abundance tables
HUMAnN uses 3 reference databases Locally cached databases have to be downloaded before using them (using the dedicated tool). Custom databases can also be used after upload.
HUMAnN creates three output files:
Ten intermediate temp output files can also be retrieved.