Mercurial > repos > iuc > annotatemyids
view annotateMyIDs.xml @ 15:cd2480f35935 draft
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/annotatemyids commit ce8871b2708d391a99cd7656d84eade6f16a1337
author | iuc |
---|---|
date | Tue, 29 Aug 2023 22:44:27 +0000 |
parents | 3e1f6f6d557e |
children | a79ee60b6926 |
line wrap: on
line source
<tool id="annotatemyids" name="annotateMyIDs" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@"> <description>annotate a generic set of identifiers</description> <macros> <token name="@TOOL_VERSION@">3.17.0</token> <token name="@VERSION_SUFFIX@">1</token> </macros> <xrefs> <xref type="bio.tools">annotatemyids</xref> </xrefs> <requirements> <requirement type="package" version="@TOOL_VERSION@">bioconductor-org.hs.eg.db</requirement> <requirement type="package" version="@TOOL_VERSION@">bioconductor-org.mm.eg.db</requirement> <requirement type="package" version="@TOOL_VERSION@">bioconductor-org.dm.eg.db</requirement> <requirement type="package" version="@TOOL_VERSION@">bioconductor-org.dr.eg.db</requirement> <requirement type="package" version="@TOOL_VERSION@">bioconductor-org.rn.eg.db</requirement> <requirement type="package" version="@TOOL_VERSION@">bioconductor-org.at.tair.db</requirement> <requirement type="package" version="@TOOL_VERSION@">bioconductor-org.gg.eg.db</requirement> <requirement type="package" version="@TOOL_VERSION@">bioconductor-org.bt.eg.db</requirement> </requirements> <version_command><![CDATA[ echo $(R --version | grep version | grep -v GNU)", org.Hs.eg.db version" $(R --vanilla --slave -e "library(org.Hs.eg.db); cat(sessionInfo()\$otherPkgs\$org.Hs.eg.db\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", org.Dr.eg.db version" $(R --vanilla --slave -e "library(org.Dr.eg.db); cat(sessionInfo()\$otherPkgs\$org.Dr.eg.db\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", org.Dm.eg.db version" $(R --vanilla --slave -e "library(org.Dm.eg.db); cat(sessionInfo()\$otherPkgs\$org.Dm.eg.db\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", org.Mm.eg.db version" $(R --vanilla --slave -e "library(org.Mm.eg.db); cat(sessionInfo()\$otherPkgs\$org.Mm.eg.db\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", org.Rn.eg.db version" $(R --vanilla --slave -e "library(org.Rn.eg.db); cat(sessionInfo()\$otherPkgs\$org.Rn.eg.db\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", org.Bt.eg.db version" $(R --vanilla --slave -e "library(org.Bt.eg.db); cat(sessionInfo()\$otherPkgs\$org.Bt.eg.db\$Version)" 2> /dev/null | grep -v -i "WARNING: ") ]]></version_command> <command detect_errors="exit_code"><![CDATA[ #if $rscriptOpt: cp '${annotatemyids_script}' '${out_rscript}' && #end if Rscript '${annotatemyids_script}' ]]> </command> <configfiles> <configfile name="annotatemyids_script"><![CDATA[ options( show.error.messages=F, error = function () { cat( geterrmessage(), file=stderr() ); q( "no", 1, F ) } ) # we need that to not crash galaxy with an UTF8 error on German LC settings. loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8") id_type <- "${id_type}" organism <- "${organism}" output_cols <- "${output_cols}" file_has_header <- ${file_has_header} remove_dups <- ${remove_dups} input <- read.table('$id_file', header=file_has_header, sep="\t", quote="") ids <- as.character(input[, 1]) if(organism == "Hs"){ suppressPackageStartupMessages(library(org.Hs.eg.db)) db <- org.Hs.eg.db } else if (organism == "Mm"){ suppressPackageStartupMessages(library(org.Mm.eg.db)) db <- org.Mm.eg.db } else if (organism == "Dm"){ suppressPackageStartupMessages(library(org.Dm.eg.db)) db <- org.Dm.eg.db } else if (organism == "Dr"){ suppressPackageStartupMessages(library(org.Dr.eg.db)) db <- org.Dr.eg.db } else if (organism == "Rn"){ suppressPackageStartupMessages(library(org.Rn.eg.db)) db <- org.Rn.eg.db } else if (organism == "At"){ suppressPackageStartupMessages(library(org.At.tair.db)) db <- org.At.tair.db } else if (organism == "Gg"){ suppressPackageStartupMessages(library(org.Gg.eg.db)) db <- org.Gg.eg.db } else if (organism == "Bt"){ suppressPackageStartupMessages(library(org.Bt.eg.db)) db <- org.Bt.eg.db } else { cat(paste("Organism type not supported", organism)) } cols <- unlist(strsplit(output_cols, ",")) result <- select(db, keys=ids, keytype=id_type, columns=cols) if(remove_dups) { result <- result[!duplicated(result$${id_type}),] } write.table(result, file='$out_tab', sep="\t", row.names=FALSE, quote=FALSE) ]]></configfile> </configfiles> <inputs> <param name="id_file" type="data" format="tabular" label="File with IDs" help="A tabular file with the first column containing one of the supported types of identifier, see Help below." /> <param name="file_has_header" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="False" label="File has header?" help="If this option is set to Yes, the tool will assume that the input file has a column header in the first row and the identifers commence on the second line. Default: No" /> <param name="organism" type="select" label="Organism" help="Select the organism the identifiers are from"> <option value="Hs" selected="true">Human</option> <option value="Mm">Mouse</option> <option value="Rn">Rat</option> <option value="Dm">Fruit fly</option> <option value="Dr">Zebrafish</option> <option value="At">Arabidopsis thaliana</option> <option value="Gg">Gallus gallus</option> <option value="Bt">Bos taurus</option> </param> <param name="id_type" type="select" label="ID Type" help="Select the type of IDs in your input file"> <option value="ENSEMBL" selected="true">Ensembl Gene</option> <option value="ENSEMBLPROT">Ensembl Protein</option> <option value="ENSEMBLTRANS">Ensembl Transcript</option> <option value="ENTREZID">Entrez</option> <option value="FLYBASE">FlyBase</option> <option value="GO">GO</option> <option value="PATH">KEGG</option> <option value="MGI">MGI</option> <option value="REFSEQ">RefSeq</option> <option value="SYMBOL">Gene Symbol</option> <option value="ZFIN">Zfin</option> </param> <param name="output_cols" type="select" multiple="True" display="checkboxes" label="Output columns" help="Choose the columns you want in the output table. Note that selecting some columns such as GO or KEGG could make the table very large as some genes may be associated with many terms. Default: ENSEMBL, ENTREZID, SYMBOL, GENENAME"> <option value="ALIAS">ALIAS</option> <option value="ENSEMBL" selected="True">ENSEMBL</option> <option value="ENTREZID" selected="True">ENTREZID</option> <option value="EVIDENCE">EVIDENCE</option> <option value="SYMBOL" selected="True">SYMBOL</option> <option value="GENENAME" selected="True">GENENAME</option> <option value="REFSEQ">REFSEQ</option> <option value="GO">GO</option> <option value="ONTOLOGY">ONTOLOGY</option> <option value="PATH">KEGG</option> </param> <param name="remove_dups" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="False" label="Remove duplicates?" help="If this option is set to Yes, only the first occurrence of each input Gene ID will be kept. Default: No" /> <param name="rscriptOpt" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="False" label="Output Rscript?" help="If this option is set to Yes, the Rscript used to annotate the IDs will be provided as a text file in the output. Default: No" /> </inputs> <outputs> <data name="out_tab" format="tabular" label="${tool.name} on ${on_string}: Annotated IDs" /> <data name="out_rscript" format="txt" label="${tool.name} on ${on_string}: Rscript"> <filter>rscriptOpt is True</filter> </data> </outputs> <tests> <!-- Ensure output table works --> <test expect_num_outputs="1"> <param name="id_file" value="genelist.txt" ftype="tabular"/> <param name="id_type" value="SYMBOL"/> <param name="organism" value="Hs"/> <output name="out_tab" file="out.tab" /> </test> <!-- Ensure Ensembl IDs input and Rscript output work --> <test expect_num_outputs="2"> <param name="id_file" value="ensembl_ids.tab" ftype="tabular"/> <param name="id_type" value="ENSEMBL"/> <param name="organism" value="Hs"/> <param name="rscriptOpt" value="True" /> <output name="out_tab" file="out_ensembl.tab" /> <output name="out_rscript" file="out_rscript.txt" compare="sim_size" /> </test> <!-- Ensure GO and KEGG output work --> <test expect_num_outputs="1"> <param name="id_file" value="ensembl_ids.tab" ftype="tabular"/> <param name="id_type" value="ENSEMBL"/> <param name="organism" value="Hs"/> <param name="output_cols" value="ENSEMBL,GO,ONTOLOGY,EVIDENCE" /> <output name="out_tab"> <assert_contents> <has_n_columns n="4"/> <has_n_lines min="700"/> <has_text_matching expression="ENSG00000012048" min="80"/> </assert_contents> </output> </test> <!-- Ensure duplicate Gene ID removal works --> <test expect_num_outputs="1"> <param name="id_file" value="ensembl_ids.tab" ftype="tabular"/> <param name="id_type" value="ENSEMBL"/> <param name="organism" value="Hs"/> <param name="output_cols" value="ENSEMBL,GO,ONTOLOGY,EVIDENCE" /> <param name="remove_dups" value="True" /> <output name="out_tab"> <assert_contents> <has_n_columns n="4"/> <has_n_lines n="9"/> <has_text_matching expression="ENSG00000012048" n="1"/> </assert_contents> </output> </test> <!-- Arabidopsis database --> <test expect_num_outputs="1"> <param name="id_file" value="genelist_arabidopsis.txt" ftype="tabular"/> <param name="id_type" value="SYMBOL"/> <param name="organism" value="At"/> <param name="output_cols" value="GO,ENTREZID" /> <output name="out_tab"> <assert_contents> <has_n_columns n="5"/> <has_n_lines min="20"/> <has_text_matching expression="CLE13" min="5"/> </assert_contents> </output> </test> <!-- Gallus gallus database --> <test expect_num_outputs="1"> <param name="id_file" value="genelist_gallus.txt" ftype="tabular"/> <param name="id_type" value="SYMBOL"/> <param name="organism" value="Gg"/> <param name="output_cols" value="GO,ENTREZID" /> <output name="out_tab"> <assert_contents> <has_n_columns n="5"/> <has_n_lines min="40"/> <has_text_matching expression="TENM2" min="20"/> </assert_contents> </output> </test> <!-- Bos taurus database --> <test expect_num_outputs="1"> <param name="id_file" value="genelist.txt" ftype="tabular"/> <param name="id_type" value="SYMBOL"/> <param name="organism" value="Bt"/> <param name="output_cols" value="ENSEMBL" /> <output name="out_tab"> <assert_contents> <has_n_columns n="2"/> <has_n_lines min="9"/> <has_text_matching expression="ENSBTAG00000007159"/> <has_text_matching expression="ENSBTAG00000009498"/> </assert_contents> </output> </test> </tests> <help><![CDATA[ .. class:: infomark **What it does** This tool can get annotation for a generic set of IDs, using the Bioconductor_ annotation data packages. Supported organisms are human, mouse, rat, fruit fly and zebrafish. The org.db packages that are used here are primarily based on mapping using Entrez Gene identifiers. More information on the annotation packages can be found at the Bioconductor website, for example, information on the human annotation package (org.Hs.eg.db) can be found here_. Examples of what this tool can be used for are: * adding gene names to IDs * mapping between IDs e.g. Entrez, Ensembl, Symbols * adding GO and KEGG identifiers .. _Bioconductor: https://www.bioconductor.org/ .. _here: http://bioconductor.org/packages/release/data/annotation/manuals/org.Hs.eg.db/man/org.Hs.eg.db.pdf ----- **Inputs** A tab-delimited file with identifiers in the first column. If the file contains a header row, select the file has a header option in the tool form above. Example: =============== ======================= **GeneID** *Additional Columns...* --------------- ----------------------- ENSG00000091831 ENSG00000082175 ENSG00000141736 ENSG00000012048 ENSG00000139618 ENSG00000129514 ENSG00000171862 ENSG00000141510 =============== ======================= ID types supported for input are: * **ENSEMBL**: Ensembl gene IDs * **ENSEMBLPROT**: Ensembl protein IDs * **ENSEMBLTRANS**: Ensembl transcript IDs * **ENTREZID**: Entrez gene Identifiers * **FLYBASE**: FlyBase accession numbers * **GO**: GO Identifiers * **MGI**: Jackson Laboratory MGI gene accession numbers * **PATH**: KEGG Pathway Identifiers * **REFSEQ**: Refseq Identifiers * **SYMBOL**: The official gene symbol * **ZFIN**: Zfin accession numbers .. class:: warningmark This tool uses the ``select`` function from the Bioconductor AnnotationDBi_ package. Note that if you request columns that have multiple matches for your IDs, select will return *one row in the output for each possible match*. This has the effect that if you request multiple columns and some of them have a many-to-one relationship to the IDs, things will continue to multiply accordingly. So it's not a good idea to request a large number of columns unless you know what you are asking for should have a one-to-one relationship with the initial set of IDs. In general, if you need to retrieve a column like **GO** or **KEGG**, that has a many-to-one relationship to the original IDs, it is most useful to extract that separately. .. _AnnotationDBi: https://www.bioconductor.org/packages/devel/bioc/manuals/AnnotationDbi/man/AnnotationDbi.pdf ----- **Outputs** If the input IDs are Ensembl, the default output will be similar to below, containing four columns. Other columns, such as GO and KEGG terms, can be selected above to be added as additional columns. Example: =============== ============ ========== ================================= **ENSEMBL** **ENTREZID** **SYMBOL** **GENENAME** --------------- ------------ ---------- --------------------------------- ENSG00000091831 2099 ESR1 estrogen receptor 1 ENSG00000082175 5241 PGR progesterone receptor ENSG00000141736 2064 ERBB2 erb-b2 receptor tyrosine kinase 2 ENSG00000012048 672 BRCA1 breast cancer 1 ENSG00000139618 675 BRCA2 breast cancer 2 ENSG00000129514 3169 FOXA1 forkhead box A1 ENSG00000171862 5728 PTEN phosphatase and tensin homolog ENSG00000141510 7157 TP53 tumor protein p53 =============== ============ ========== ================================= Columns available for output include many of the ID columns already described under Inputs above and also: * **ALIAS**: Commonly used gene symbols * **EVIDENCE**: Evidence codes for GO associations with a gene of interest * **GENENAME**: The full gene name * **ONTOLOGY**: For GO Identifiers, which Gene Ontology (BP, CC, or MF) ]]></help> <citations> <citation type="bibtex"> @unpublished{None, author = {Mark Dunning}, title = {annotateMyIDs}, year = {2017}, eprint = {None}, url = {} }</citation> </citations> </tool>