What it does
This tool can get annotation for a generic set of IDs, using the Bioconductor annotation data packages. Supported organisms are human, mouse, rat, fruit fly and zebrafish. The org.db packages that are used here are primarily based on mapping using Entrez Gene identifiers. More information on the annotation packages can be found at the Bioconductor website, for example, information on the human annotation package (org.Hs.eg.db) can be found here.
Examples of what this tool can be used for are:
Inputs
A tab-delimited file with identifiers in the first column. If the file contains a header row, select the file has a header option in the tool form above.
Example:
GeneID Additional Columns... ENSG00000091831 ENSG00000082175 ENSG00000141736 ENSG00000012048 ENSG00000139618 ENSG00000129514 ENSG00000171862 ENSG00000141510 ID types supported for input are:
- ENSEMBL: Ensembl gene IDs
- ENSEMBLPROT: Ensembl protein IDs
- ENSEMBLTRANS: Ensembl transcript IDs
- ENTREZID: Entrez gene Identifiers
- FLYBASE: FlyBase accession numbers
- GO: GO Identifiers
- MGI: Jackson Laboratory MGI gene accession numbers
- PATH: KEGG Pathway Identifiers
- REFSEQ: Refseq Identifiers
- SYMBOL: The official gene symbol
- ZFIN: Zfin accession numbers
This tool uses the select function from the Bioconductor AnnotationDBi package. Note that if you request columns that have multiple matches for your IDs, select will return one row in the output for each possible match. This has the effect that if you request multiple columns and some of them have a many-to-one relationship to the IDs, things will continue to multiply accordingly. So it's not a good idea to request a large number of columns unless you know what you are asking for should have a one-to-one relationship with the initial set of IDs. In general, if you need to retrieve a column like GO or KEGG, that has a many-to-one relationship to the original IDs, it is most useful to extract that separately.
Outputs
If the input IDs are Ensembl, the default output will be similar to below, containing four columns. Other columns, such as GO and KEGG terms, can be selected above to be added as additional columns.
Example:
ENSEMBL ENTREZID SYMBOL GENENAME ENSG00000091831 2099 ESR1 estrogen receptor 1 ENSG00000082175 5241 PGR progesterone receptor ENSG00000141736 2064 ERBB2 erb-b2 receptor tyrosine kinase 2 ENSG00000012048 672 BRCA1 breast cancer 1 ENSG00000139618 675 BRCA2 breast cancer 2 ENSG00000129514 3169 FOXA1 forkhead box A1 ENSG00000171862 5728 PTEN phosphatase and tensin homolog ENSG00000141510 7157 TP53 tumor protein p53 Columns available for output include many of the ID columns already described under Inputs above and also:
- ALIAS: Commonly used gene symbols
- EVIDENCE: Evidence codes for GO associations with a gene of interest
- GENENAME: The full gene name
- ONTOLOGY: For GO Identifiers, which Gene Ontology (BP, CC, or MF)