eggnog-mapper

Overview

eggnog-mapper is a tool for fast functional annotation of novel sequences (genes or proteins) using precomputed eggNOG-based orthology assignments. Obvious examples include the annotation of novel genomes, transcriptomes or even metagenomic gene catalogs. The use of orthology predictions for functional annotation is considered more precise than traditional homology searches, as it avoids transferring annotations from paralogs (duplicate genes with a higher chance of being involved in functional divergence).

EggNOG-mapper is also available as a public online resource: http://beta-eggnogdb.embl.de/#/app/emapper.

Outputs

annotations

This file provides final annotations of each query. Tab-delimited columns in the file are:

query_name: query sequence name

seed_eggNOG_ortholog: best protein match in eggNOG

seed_ortholog_evalue: best protein match (e-value)

seed_ortholog_score: best protein match (bit-score)

predicted_taxonomic_group

predicted_protein_name: Predicted protein name for query sequences

GO_terms: Comma delimited list of predicted Gene Ontology terms

EC_number

KEGG_KO

KEGG_Pathway: Comma delimited list of predicted KEGG pathways

KEGG_Module

KEGG_Reaction

KEGG_rclass

BRITE

KEGG_TC

CAZy

BiGG_Reactions

Annotation_tax_scope: The taxonomic scope used to annotate this query sequence

Matching_OGs: Comma delimited list of matching eggNOG Orthologous Groups

best_OG|evalue|score: Best matching Orthologous Groups (deprecated, use smallest from eggnog OGs)

COG_functional_categories: COG functional category inferred from best matching OG

eggNOG_free_text_description

orthologs

This output is only created if the option --report_orthologs is checked. It provides the orthologs used for the annotation. It's a tab delimited file with the following columns:

query

orth_type Type of orthologs in this row. See --target_orthologs.

species

orthologs comma-separated list of orthologs (If an ortholog shows a "*", such ortholog was used to transfer its annotations to the query.)

**sequences without annotation **

This output is created if cached annotations are used as input. It is a FASTA file containing all sequences that are not found in the cached annotations. These sequences can then be used as input for another run of the EggNOG mapper computing seed orthologs with diamond, etc.

Recommentation for large input data

EggNOG-mapper consists of two phases

finding seed orthologous sequences (compute intensive)
expanding annotations (IO intensive)

by default (i.e. if Method to search seed orthologs is not Skip search stage... and Annotate seed orthologs is Yes) both phases are executed within one tool run.

For large input FASTA datasets in can be favourable to split this in two separate tool runs as follows:

Split the FASTA (e.g. 1M seqs per data set)
Run the search phase only (set Annotate seed orthologs to No) on the separate FASTA files.
Run the annotation phase (set Method to search seed orthologs to Skip search stage...)

See [also](https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.8#Setting_up_large_annotation_jobs)

Another alternative is to use cached annotations (produced in a run with --md5 enabled).