Galaxy | Tool Preview

annotateMyIDs (version 3.18.0+galaxy0)
A tabular file with the first column containing one of the supported types of identifier, see Help below.
If this option is set to Yes, the tool will assume that the input file has a column header in the first row and the identifers commence on the second line. Default: No
Select the organism the identifiers are from
Select the type of IDs in your input file
Choose the columns you want in the output table. Note that selecting some columns such as GO or KEGG could make the table very large as some genes may be associated with many terms. Default: ENSEMBL, ENTREZID, SYMBOL, GENENAME
If this option is set to Yes, only the first occurrence of each input Gene ID will be kept. Default: No
If this option is set to Yes, the Rscript used to annotate the IDs will be provided as a text file in the output. Default: No

What it does

This tool can get annotation for a generic set of IDs, using the Bioconductor annotation data packages. Supported organisms are human, mouse, rat, fruit fly and zebrafish. The org.db packages that are used here are primarily based on mapping using Entrez Gene identifiers. More information on the annotation packages can be found at the Bioconductor website, for example, information on the human annotation package (org.Hs.eg.db) can be found here.

Examples of what this tool can be used for are:


Inputs

A tab-delimited file with identifiers in the first column. If the file contains a header row, select the file has a header option in the tool form above.

Example:

GeneID Additional Columns...
ENSG00000091831  
ENSG00000082175  
ENSG00000141736  
ENSG00000012048  
ENSG00000139618  
ENSG00000129514  
ENSG00000171862  
ENSG00000141510  

ID types supported for input are:

  • ENSEMBL: Ensembl gene IDs
  • ENSEMBLPROT: Ensembl protein IDs
  • ENSEMBLTRANS: Ensembl transcript IDs
  • ENTREZID: Entrez gene Identifiers
  • FLYBASE: FlyBase accession numbers
  • GO: GO Identifiers
  • MGI: Jackson Laboratory MGI gene accession numbers
  • PATH: KEGG Pathway Identifiers
  • REFSEQ: Refseq Identifiers
  • SYMBOL: The official gene symbol
  • ZFIN: Zfin accession numbers

This tool uses the select function from the Bioconductor AnnotationDBi package. Note that if you request columns that have multiple matches for your IDs, select will return one row in the output for each possible match. This has the effect that if you request multiple columns and some of them have a many-to-one relationship to the IDs, things will continue to multiply accordingly. So it's not a good idea to request a large number of columns unless you know what you are asking for should have a one-to-one relationship with the initial set of IDs. In general, if you need to retrieve a column like GO or KEGG, that has a many-to-one relationship to the original IDs, it is most useful to extract that separately.


Outputs

If the input IDs are Ensembl, the default output will be similar to below, containing four columns. Other columns, such as GO and KEGG terms, can be selected above to be added as additional columns.

Example:

ENSEMBL ENTREZID SYMBOL GENENAME
ENSG00000091831 2099 ESR1 estrogen receptor 1
ENSG00000082175 5241 PGR progesterone receptor
ENSG00000141736 2064 ERBB2 erb-b2 receptor tyrosine kinase 2
ENSG00000012048 672 BRCA1 breast cancer 1
ENSG00000139618 675 BRCA2 breast cancer 2
ENSG00000129514 3169 FOXA1 forkhead box A1
ENSG00000171862 5728 PTEN phosphatase and tensin homolog
ENSG00000141510 7157 TP53 tumor protein p53

Columns available for output include many of the ID columns already described under Inputs above and also:

  • ALIAS: Commonly used gene symbols
  • EVIDENCE: Evidence codes for GO associations with a gene of interest
  • GENENAME: The full gene name
  • ONTOLOGY: For GO Identifiers, which Gene Ontology (BP, CC, or MF)