Galaxy | Tool Preview

Regroup (version 3.8+galaxy0)

What it does

HUMAnN is a pipeline for efficiently and accuretly profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data.

Read more about the tool: http://huttenhower.sph.harvard.edu/humann

This tool takes a table of feature values and a mapping of groups to component features to produce a new table with group values in place of feature values.

HUMAnN gene family output can contain a very large number of features depending on the complexity of your underlying sample. One way to explore this information in a simplified manner is via HUMAnN's own pathway coverage and abundance, which summarize the values of their member genes. However, this approach does not apply to gene families that are not associated with metabolic pathways.

To further simplify the exploration of gene family abundance data, users can regroup gene families into other functional categories using the current tool. This tool takes as arguments a gene family abundance table and a mapping (groups) file that indicates which gene families belong to which groups.

Out of the box, HUMAnN can regroup gene families to MetaCyc reactions (a step which is also used internally as part of MetaCyc pathway quantification). Users can use additional mapping files for both UniRef90 and UniRef50 gene families to the following systems:

  • MetaCyc Reactions
  • KEGG Orthogroups (KOs)
  • Pfam domains
  • Level-4 enzyme commission (EC) categories
  • EggNOG (including COGs)
  • Gene Ontology (GO)
  • Informative GO

In most cases, mappings are directly inferred from the annotation of the corresponding UniRef centroid sequence in UniProt.

One exception to this are the "informative GO" (infogo1000) maps: These are informative subsets of GO computed from UniProt's annotations and the structure of the GO hierarchy specifically for HUMAnN (each informative GO term has >1,000 UniRef centroids annotated to it, but none of its progeny terms have >1,000 centroids so annotated).

If the "UNMAPPED" gene abundance feature is included in a user's input, it will automatically be carried forward to the final output. In addition, genes that do not group with a non-trivial feature are combined as an "UNGROUPED" group. By default, UNGROUPED reflects the total abundance of genes that did not belong to another group (similar in spirit to the "UNINTEGRATED" value reported in the pathway abundance file).

Some groups are not associated by default with human-readable names. To attach names to a regrouped table, use the HUMAnN rename tool (The "GO" name map can be used for both raw GO and informative GO.)

Inputs

Users are free to create and use additional mapping files and pass them to this tool. The format of a mapping file is:

`` group1 uniref1 uniref2 uniref3 ... ``

`` group2 uniref1 uniref5 ... ``

Where spaces between items above denote TABS. By default, feature abundances (such as gene families) are summed to produce group abundances.