Galaxy |

annotateMyIDs (version 3.18.0+galaxy0)

File with IDs:

A tabular file with the first column containing one of the supported types of identifier, see Help below.

File has header?:

If this option is set to Yes, the tool will assume that the input file has a column header in the first row and the identifers commence on the second line. Default: No

Organism:

Select the organism the identifiers are from

ID Type:

Select the type of IDs in your input file

Output columns:

Choose the columns you want in the output table. Note that selecting some columns such as GO or KEGG could make the table very large as some genes may be associated with many terms. Default: ENSEMBL, ENTREZID, SYMBOL, GENENAME

Remove duplicates?:

If this option is set to Yes, only the first occurrence of each input Gene ID will be kept. Default: No

Output Rscript?:

If this option is set to Yes, the Rscript used to annotate the IDs will be provided as a text file in the output. Default: No

What it does

This tool can get annotation for a generic set of IDs, using the Bioconductor annotation data packages. Supported organisms are human, mouse, rat, fruit fly and zebrafish. The org.db packages that are used here are primarily based on mapping using Entrez Gene identifiers. More information on the annotation packages can be found at the Bioconductor website, for example, information on the human annotation package (org.Hs.eg.db) can be found here.

Examples of what this tool can be used for are:

adding gene names to IDs
mapping between IDs e.g. Entrez, Ensembl, Symbols
adding GO and KEGG identifiers

Inputs

A tab-delimited file with identifiers in the first column. If the file contains a header row, select the file has a header option in the tool form above.

Example:

GeneID Additional Columns...

ENSG00000091831

ENSG00000082175

ENSG00000141736

ENSG00000012048

ENSG00000139618

ENSG00000129514

ENSG00000171862

ENSG00000141510

ID types supported for input are:

ENSEMBL: Ensembl gene IDs

ENSEMBLPROT: Ensembl protein IDs

ENSEMBLTRANS: Ensembl transcript IDs

ENTREZID: Entrez gene Identifiers

FLYBASE: FlyBase accession numbers

GO: GO Identifiers

MGI: Jackson Laboratory MGI gene accession numbers

PATH: KEGG Pathway Identifiers

REFSEQ: Refseq Identifiers

SYMBOL: The official gene symbol

ZFIN: Zfin accession numbers

This tool uses the select function from the Bioconductor AnnotationDBi package. Note that if you request columns that have multiple matches for your IDs, select will return one row in the output for each possible match. This has the effect that if you request multiple columns and some of them have a many-to-one relationship to the IDs, things will continue to multiply accordingly. So it's not a good idea to request a large number of columns unless you know what you are asking for should have a one-to-one relationship with the initial set of IDs. In general, if you need to retrieve a column like GO or KEGG, that has a many-to-one relationship to the original IDs, it is most useful to extract that separately.

Outputs

If the input IDs are Ensembl, the default output will be similar to below, containing four columns. Other columns, such as GO and KEGG terms, can be selected above to be added as additional columns.

Example:

ENSEMBL ENTREZID SYMBOL GENENAME

ENSG00000091831 2099 ESR1 estrogen receptor 1

ENSG00000082175 5241 PGR progesterone receptor

ENSG00000141736 2064 ERBB2 erb-b2 receptor tyrosine kinase 2

ENSG00000012048 672 BRCA1 breast cancer 1

ENSG00000139618 675 BRCA2 breast cancer 2

ENSG00000129514 3169 FOXA1 forkhead box A1

ENSG00000171862 5728 PTEN phosphatase and tensin homolog

ENSG00000141510 7157 TP53 tumor protein p53

Columns available for output include many of the ID columns already described under Inputs above and also:

ALIAS: Commonly used gene symbols

EVIDENCE: Evidence codes for GO associations with a gene of interest

GENENAME: The full gene name

ONTOLOGY: For GO Identifiers, which Gene Ontology (BP, CC, or MF)

GeneID	Additional Columns...
ENSG00000091831
ENSG00000082175
ENSG00000141736
ENSG00000012048
ENSG00000139618
ENSG00000129514
ENSG00000171862
ENSG00000141510

ENSEMBL	ENTREZID	SYMBOL	GENENAME
ENSG00000091831	2099	ESR1	estrogen receptor 1
ENSG00000082175	5241	PGR	progesterone receptor
ENSG00000141736	2064	ERBB2	erb-b2 receptor tyrosine kinase 2
ENSG00000012048	672	BRCA1	breast cancer 1
ENSG00000139618	675	BRCA2	breast cancer 2
ENSG00000129514	3169	FOXA1	forkhead box A1
ENSG00000171862	5728	PTEN	phosphatase and tensin homolog
ENSG00000141510	7157	TP53	tumor protein p53