Galaxy | Tool Preview

eggnog-mapper

Overview

eggnog-mapper is a tool for fast functional annotation of novel sequences (genes or proteins) using precomputed eggNOG-based orthology assignments. Obvious examples include the annotation of novel genomes, transcriptomes or even metagenomic gene catalogs. The use of orthology predictions for functional annotation is considered more precise than traditional homology searches, as it avoids transferring annotations from paralogs (duplicate genes with a higher chance of being involved in functional divergence).

EggNOG-mapper is also available as a public online resource: http://beta-eggnogdb.embl.de/#/app/emapper.

Outputs

seed orthologs

each line in the file provides the best match of each query within the best Orthologous Group (OG) reported in the [project].hmm_hits file, obtained running PHMMER against all sequences within the best OG. The seed ortholog is used to fetch fine-grained orthology relationships from eggNOG. If using the diamond search mode, seed orthologs are directly obtained from the best matching sequences by running DIAMOND against the whole eggNOG protein space.

Recommentation for large input data

EggNOG-mapper consists of two phases

  1. finding seed orthologous sequences (compute intensive)
  2. expanding annotations (IO intensive)

by default (i.e. if Method to search seed orthologs is not Skip search stage... and Annotate seed orthologs is Yes) both phases are executed within one tool run.

For large input FASTA datasets in can be favourable to split this in two separate tool runs as follows:

  1. Split the FASTA (e.g. 1M seqs per data set)
  2. Run the search phase only (set Annotate seed orthologs to No) on the separate FASTA files.
  3. Run the annotation phase (set Method to search seed orthologs to Skip search stage...)

See [also](https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.8#Setting_up_large_annotation_jobs)

Another alternative is to use cached annotations (produced in a run with --md5 enabled).