Mercurial > repos > galaxyp > reactome_pathwaymatcher
view pathwaymatcher.xml @ 6:622dd6d51032 draft default tip
"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/pathwaymatcher commit 421ada25856b841ce17949e80eb3a47fa276a967"
author | galaxyp |
---|---|
date | Mon, 06 Apr 2020 19:18:34 -0400 |
parents | a2ab72e994dc |
children |
line wrap: on
line source
<tool id="reactome_pathwaymatcher" name="Pathway Matcher" version="@PATHWAYMATCHER_VERSION@.@TOOL_SUBVERSION@"> <description> PathwayMatcher is a software tool to search for pathways related to a list of proteins in Reactome. </description> <macros> <token name="@PATHWAYMATCHER_VERSION@">1.9.1</token> <token name="@TOOL_SUBVERSION@">4</token> <xml name="input_fasta"> <param format="fasta" name="input_database" type="data" label="Protein Database" help="Select FASTA database from history"/> </xml> </macros> <requirements> <requirement type="package" version="@PATHWAYMATCHER_VERSION@">pathwaymatcher</requirement> <requirement type="package" version="3.0">zip</requirement> <requirement type="package" version="8.31">coreutils</requirement> </requirements> <stdio> <exit_code range="1:" level="fatal" description="Job Failed" /> <regex match="java.*Exception" level="fatal" description="Java Exception"/> <regex match="Could not create the Java virtual machine" level="fatal" description="JVM Error"/> <regex match="filename not matched: reports/proteins_proteoforms.txt" level="fatal" description="PeptideShaker archive does not contain the proteoforms file. It may have been created by a 1.x PeptideShaker version."/> </stdio> <command> <![CDATA[ #from datetime import datetime #import json #import os #set $exp_str = "Galaxy_Experiment_%s" % datetime.now().strftime("%Y%m%d%H%M%s") #set $samp_str = "Sample_%s" % datetime.now().strftime("%Y%m%d%H%M%s") #set $temp_stderr = "pathwaym_stderr" #set $bin_dir = "bin" mkdir output; cwd=`pwd`; export HOME=\$cwd; ## If we use peptideshaker 2.x files as inputs, firstly we need to uncompress their proteoforms files. #for $i, $s in enumerate($match_types) #if $s.match_type.match_type_selector == "peptideshakerzip_proteoforms" unzip -j '${$s.match_type.input_peptideshakerzip_proteoforms}' 'reports/proteins_proteoforms.txt' -d './' && mv proteins_proteoforms.txt ps_proteoforms_'${$i}'.txt && #end if #end for ##################### ## Pathway Matcher ## ##################### pathwaymatcher #for $i, $s in enumerate($match_types) ## PROTEOFORMS #if $s.match_type.match_type_selector == "proteoforms" #if $s.match_type.proteoform_match_criteria: match-proteoforms -m '${s.match_type.proteoform_match_criteria}' -i '${s.match_type.input_proteoforms}' -r '${s.match_type.proteoform_range}' #else: match-proteoforms -i '${s.match_type.input_proteoforms}' -r '${s.match_type.proteoform_range}' #end if #end if ## PROTEOFORMS FROM PEPTIDESHAKER FILE #if $s.match_type.match_type_selector == "peptideshakerzip_proteoforms" #if $s.match_type.proteoform_peptideshakerzip_match_criteria: match-proteoforms -m '${s.match_type.proteoform_peptideshakerzip_match_criteria}' -i ps_proteoforms_'${$i}'.txt -r '${s.match_type.proteoform_peptideshakerzip_range}' #else: match-proteoforms -i ps_proteoforms_'${$i}'.txt -r '${s.match_type.proteoform_peptideshakerzip_range}' #end if #end if ## GENES #if $s.match_type.match_type_selector == "gene" match-genes -i '${s.match_type.input_gene}' #end if ## PROTEINS #if $s.match_type.match_type_selector == "uniprot" match-uniprot -i '${s.match_type.input_uniprot}' #end if #if $s.match_type.match_type_selector == "ensembl" match-ensembl -i '${s.match_type.input_ensembl}' #end if ## GENETIC VARIANTS #if $s.match_type.match_type_selector == "vcf" match-vcf -i '${s.match_type.input_vcf}' #end if #if $s.match_type.match_type_selector == "chrbp" match-chrbp -i '${s.match_type.input_chrbp}' #end if #if $s.match_type.match_type_selector == "rsid" match-rsids -i '${s.match_type.input_rsid}' #end if ## PEPTIDES #if $s.match_type.match_type_selector == "peptide" match-peptides -i '${s.match_type.input_peptide}' -f '${s.match_type.input_database}' #end if #if $s.match_type.match_type_selector == "modifiedpeptide" match-modified-peptides -i '${s.match_type.input_modifiedpeptide}' -f '${s.match_type.input_database}' -m '${s.match_type.modifiedpeptide_match_criteria}' -r '${s.match_type.modifiedpeptide_ptm_range}' #end if #end for ## OUTPUT OPTIONS #if $output_options.search_top_level_info: -T #end if #set $output_graphs_list = str($output_options.output_graphs).split(',') #if 'gg' in $output_graphs_list: -gg #end if #if 'gu' in $output_graphs_list: -gu #end if #if 'gp' in $output_graphs_list: -gp #end if ## We create a folder to contain graphs files. #if $output_options.output_graphs: && mkdir "graphs" #end if #if 'gg' in $output_graphs_list: && mv -t "graphs" "geneExternalEdges.tsv" "geneInternalEdges.tsv" "geneVertices.tsv" #end if #if 'gu' in $output_graphs_list: && mv -t "graphs" "proteinExternalEdges.tsv" "proteinInternalEdges.tsv" "proteinVertices.tsv" #end if #if 'gp' in $output_graphs_list: && mv -t "graphs" "proteoformExternalEdges.tsv" "proteoformInternalEdges.tsv" "proteoformVertices.tsv" #end if ]]> </command> <inputs> <repeat name="match_types" title="Match" min="1"> <conditional name="match_type"> <param name="match_type_selector" type="select" label="Match type" help=""> <option value="proteoforms">Proteoforms</option> <option value="peptideshakerzip_proteoforms">Proteoforms from Peptideshaker 2.x Archive</option> <option value="gene">Genes</option> <option value="uniprot">Proteins - UniProt Accession list</option> <option value="ensembl">Proteins - Ensembl identifier list</option> <option value="vcf">Genetic variants - Variant Call Format Specification</option> <option value="chrbp">Genetic variants - Chromosomes and base pairs</option> <option value="rsid">Genetic variants - SNP rsId list</option> <option value="peptide">Peptides - Simple list</option> <option value="modifiedpeptide">Peptides - Peptide List with PTM types and sites</option> </param> <!-- Proteoforms --> <when value="proteoforms"> <param format="txt" name="input_proteoforms" type="data" label="Proteoforms" help="A proteoform defines a specific state of a protein. It is composed by the protein UniProt accession, isoform and set of post translational modifications. The input file contains one line for each proteoform. Each PTM is specified using a modification identifier and a site, separated by ':'(semicolon). For example: '00046:133'. The identifier is a 5 digit id from the PSI-MOD Protein Modification Onthology [6]."/> <param name="proteoform_match_criteria" type="select" label="Proteoform match criteria"> <option value="STRICT">STRICT</option> <option value="SUPERSET">SUPERSET</option> <option value="SUPERSET_NO_TYPES">SUPERSET NO TYPES</option> <option value="SUBSET" selected="True">SUBSET</option> <option value="SUBSET_NO_TYPES">SUBSET NO TYPES</option> <option value="ONE">ONE</option> <option value="ONE_NO_TYPES">ONE_NO_TYPES</option> <option value="ACCESSION">ACCESSION</option> </param> <param name="proteoform_range" type="integer" value="0" label="Integer range of error for PTM sites" optional="true" help="Plus minus positions for the same PTM site"/> </when> <when value="peptideshakerzip_proteoforms"> <param format="zip" name="input_peptideshakerzip_proteoforms" type="data" label="Proteoforms from Peptideshaker 2.x Archive" help="A proteoform defines a specific state of a protein. It is composed by the protein UniProt accession, isoform and set of post translational modifications. The input file contains one line for each proteoform. Each PTM is specified using a modification identifier and a site, separated by ':'(semicolon). For example: '00046:133'. The identifier is a 5 digit id from the PSI-MOD Protein Modification Onthology [6]."/> <param name="proteoform_peptideshakerzip_match_criteria" type="select" label="Proteoform match criteria"> <option value="STRICT">STRICT</option> <option value="SUPERSET">SUPERSET</option> <option value="SUPERSET_NO_TYPES">SUPERSET NO TYPES</option> <option value="SUBSET" selected="True">SUBSET</option> <option value="SUBSET_NO_TYPES">SUBSET NO TYPES</option> <option value="ONE">ONE</option> <option value="ONE_NO_TYPES">ONE_NO_TYPES</option> <option value="ACCESSION">ACCESSION</option> </param> <param name="proteoform_peptideshakerzip_range" type="integer" value="0" label="Integer range of error for PTM sites" optional="true" help="Plus minus positions for the same PTM site"/> </when> <!-- Genes --> <when value="gene"> <param format="txt" name="input_gene" type="data" label="Genes" help="File with a one gene name in each line. Genes follow the HUGO gene nomenclature[3]."/> </when> <!-- Proteins --> <when value="uniprot"> <param format="txt" name="input_uniprot" type="data" label="UniProt Accession list" help="File with a one Uniprot Accession [4] in each line."/> </when> <when value="ensembl"> <param format="txt" name="input_ensembl" type="data" label="Ensembl identifier list" help="File with a one Ensembl identifier [5] in each line."/> </when> <!-- Genetic variants --> <when value="vcf"> <param format="vcf" name="input_vcf" type="data" label="Variant Call Format Specification" help="The input follows the Variant Call Format Specification[2] v4.3. It also allows the possibility to specify only the first 4 columns in the data section of the file: CHROM, POS, ID, REF. "/> </when> <when value="chrbp"> <param format="txt" name="input_chrbp" type="data" label="Chromosomes and base pairs" help="Genetic variants can also be represented using the chromosome and the base pair numbers. The input should be sorted by chromosome number and then by base pair. "/> </when> <when value="rsid"> <param format="txt" name="input_rsid" type="data" label="SNP rsId list" help="The file contains one rsid identifier as defined in dbSNP[1] on each row. The list must be ordered by chromosome and base pair (bp). The list must not have duplicates. All rsids must appear in the human assembly GRCh37.p13. "/> </when> <!-- Peptides --> <when value="peptide"> <param format="txt" name="input_peptide" type="data" label="Simple list" help="File with a one peptide sequence in each line."/> <expand macro="input_fasta" /> </when> <when value="modifiedpeptide"> <param format="txt" name="input_modifiedpeptide" type="data" label="Peptide List with PTM types and sites" help="Each line of the file corresponds to a single peptide with post-translational modifications."/> <expand macro="input_fasta" /> <param name="modifiedpeptide_match_criteria" type="select" label="Proteoform match criteria. Only modified peptides."> <option value="STRICT">STRICT</option> <option value="SUPERSET">SUPERSET</option> <option value="SUPERSET_NO_TYPES">SUPERSET NO TYPES</option> <option value="SUBSET" selected="True">SUBSET</option> <option value="SUBSET_NO_TYPES">SUBSET NO TYPES</option> <option value="ONE">ONE</option> <option value="ONE_NO_TYPES">ONE_NO_TYPES</option> <option value="ACCESSION">ACCESSION</option> </param> <param name="modifiedpeptide_ptm_range" type="integer" value="0" label="PTM position range" optional="true" help="Integer number margin error for sites of PTMs. Only for modified peptides."/> </when> </conditional> </repeat> <section name="output_options" expanded="true" title="Output options"> <param name="search_top_level_info" type="select" label="Add Top Level Pathways in the search result."> <option value="0" selected="True">False</option> <option value="1">True</option> </param> <param name="output_graphs" type="select" display="checkboxes" multiple="True" label="Connection graphs" help="Generates a zipped file with connection graphs as an additional output when executing the pathway search and analysis. The graph can use genes, proteins or proteoforms as vertices."> <option value="gg">Genes</option> <option value="gu">Proteins</option> <option value="gp">Proteoforms</option> </param> </section> </inputs> <outputs> <data name="search" format="tsv" from_work_dir="search.tsv" label="${tool.name} - search on ${on_string}" /> <data name="analysis" format="tsv" from_work_dir="analysis.tsv" label="${tool.name} - analysis on ${on_string}" /> <collection name="graphs_files" type="list" label="${tool.name} - graphs on ${on_string}" > <filter>output_options['output_graphs'] != None</filter> <discover_datasets pattern="__name_and_ext__" directory="graphs" ext="tsv"/> </collection> </outputs> <tests> <!-- Test that genes search works --> <test> <repeat name="match_types"> <conditional name="match_type"> <param name="match_type_selector" value="gene"/> <param name="input_gene" value="genes.txt" ftype="txt" /> </conditional> </repeat> <output name="search" ftype="tsv" > <assert_contents> <has_line_matching expression="CFTR\tP13569\tR-HSA-383190\tHCO3- transport through ion channel\tR-HSA-382556\tABC-family proteins mediated transport\tR-HSA-382551\tTransport of small molecules"/> <has_line_matching expression="TGFB1\tP01137\tR-HSA-170850\tPhosphorylated SMAD2/3 dissociates from TGFBR\tR-HSA-170834\tSignaling by TGF-beta Receptor Complex\tR-HSA-162582\tSignal Transduction"/> <has_line_matching expression="SCNN1B\tP51168\tR-HSA-2682349\tRAF1:SGK:TSC22D3:WPP ubiquitinates SCNN channels\tR-HSA-382551\tTransport of small molecules\tR-HSA-382551\tTransport of small molecules"/> <has_line_matching expression="TNFRSF1A\tP19438\tR-HSA-5626988\tTNF-alpha:TNFR1 binds NSMAF\tR-HSA-75893\tTNF signaling\tR-HSA-162582\tSignal Transduction"/> <has_n_columns n="8" /> </assert_contents> </output> </test> <!-- Test graphs from proteoforms --> <test> <repeat name="match_types"> <conditional name="match_type"> <param name="match_type_selector" value="proteoforms"/> <param name="input_proteoforms" value="proteoforms.txt" ftype="txt" /> <param name="proteoform_match_criteria" value="SUBSET"/> </conditional> </repeat> <param name="output_graphs" value="gg,gu,gp" /> <output_collection name="graphs_files" type="list"> <element name="geneExternalEdges" ftype="tsv"> <assert_contents> <has_line_matching expression="ERBB4\tSRC\tReaction\tR-HSA-1963586\tcatalyst\tcatalyst"/> <has_line_matching expression="RNF85\tSRC1\tReaction\tR-HSA-2316434\tcatalyst\tcatalyst"/> <has_line_matching expression="CBL2\tUNQ2500/PRO5800\tReaction\tR-HSA-5654677\tinput\tinput"/> <has_line_matching expression="FN1\tPRKM3\tReaction\tR-HSA-5672973\tcatalyst\toutput"/> <has_line_matching expression="MAPK11\tMAPK12\tSet\tR-HSA-448855\tmember/candidate\tmember/candidate"/> <has_line_matching expression="SAPK2B\tSAPK3\tSet\tR-HSA-448855\tmember/candidate\tmember/candidate"/> <has_n_columns n="6" /> </assert_contents> </element> <element name="geneInternalEdges" ftype="tsv" file="proteoforms_graphs/geneInternalEdges.tsv" compare="sim_size" delta="10"/> <element name="geneVertices" ftype="tsv"> <assert_contents> <has_line_matching expression="ERK1\tMitogen-activated protein kinase 3 shortName:MAP kinase 3 shortName:MAPK 3 ecNumber2.7.11.24/ecNumber"/> <has_line_matching expression="CREB1\tCyclic AMP-responsive element-binding protein 1 shortName:CREB-1 shortName:cAMP-responsive element-binding protein 1 ]"/> <has_line_matching expression="SAPK2\tMitogen-activated protein kinase 11 shortName:MAP kinase 11 shortName:MAPK 11 ecNumber2.7.11.24/ecNumber"/> <has_line_matching expression="STAT5\tSignal transducer and activator of transcription 5A ]"/> <has_n_columns n="2" /> </assert_contents> </element> <element name="proteinExternalEdges" ftype="tsv"> <assert_contents> <has_line_matching expression="P12931\tP56975\tReaction\tR-HSA-1963586\tcatalyst\toutput"/> <has_line_matching expression="P12931\tP51617\tReaction\tR-HSA-2316434\tcatalyst\tregulator"/> <has_line_matching expression="P12931\tQ8N302\tReaction\tR-HSA-6802933\tcatalyst\tinput"/> <has_line_matching expression="O95352\tP12931\tComplex\tR-HSA-6802695\tcomponent\tcomponent"/> <has_line_matching expression="Q15759\tQ16539\tSet\tR-HSA-198703\tmember/candidate\tmember/candidate"/> <has_n_columns n="6" /> </assert_contents> </element> <element name="proteinInternalEdges" ftype="tsv"> <assert_contents> <has_line_matching expression="P00519\tP12931\tReaction\tR-HSA-8942607\tcatalyst\tcatalyst"/> <has_line_matching expression="P12931\tQ9UQC2\tReaction\tR-HSA-205234\toutput\toutput"/> <has_line_matching expression="P12931\tP27361\tReaction\tR-HSA-6802933\tinput\toutput"/> <has_line_matching expression="P12931\tP27361\tReaction\tR-HSA-6802910\tinput\tinput"/> <has_line_matching expression="P00519\tP12931\tSet\tR-HSA-8942611\tmember/candidate\tmember/candidate"/> <has_line_matching expression="P12931\tP42229\tComplex\tR-HSA-1469999\tcomponent\tcomponent"/> <has_n_columns n="6" /> </assert_contents> </element> <element name="proteinVertices" ftype="tsv"> <assert_contents> <has_line_matching expression="P16220\tCyclic AMP-responsive element-binding protein 1 shortName:CREB-1 shortName:cAMP-responsive element-binding protein 1 ]"/> <has_line_matching expression="P27361\tMitogen-activated protein kinase 3 shortName:MAP kinase 3 shortName:MAPK 3 ecNumber2.7.11.24/ecNumber"/> <has_line_matching expression="P42229\tSignal transducer and activator of transcription 5A ]"/> <has_n_columns n="2" /> </assert_contents> </element> <element name="proteoformExternalEdges" ftype="tsv"> <assert_contents> <has_line_matching expression="O43597;00048:55,00048:227\tP22681;\tComplex\tR-HSA-934576\tcomponent\tcomponent"/> <has_line_matching expression="P00533;00048:992,00048:1045,00048:1068,00048:1086,00048:1148,00048:1173\tP22681;\tComplex\tR-HSA-182935\tcomponent\tcomponent"/> <has_line_matching expression="P27361;00047:202,00048:204\tP28482;00047:185,00048:187\tSet\tR-HSA-450307\tmember/candidate\tmember/candidate"/> <has_line_matching expression="P12931;00048:419\tQ01196;\tComplex\tR-HSA-8937687\tcomponent\tcomponent"/> <has_line_matching expression="P15509;\tP42229;00048:694\tSet\tR-HSA-913465\tmember/candidate\tmember/candidate"/> <has_n_columns n="6" /> </assert_contents> </element> <element name="proteoformInternalEdges" ftype="tsv"> <assert_contents> <has_line_matching expression="P27361;00047:202,00048:204\tQ15759;00047:180,00048:182\tSet\tR-HSA-450307\tmember/candidate\tmember/candidate"/> <has_line_matching expression="P27361;00047:202,00048:204\tQ15759;00047:180,00048:182\tSet\tR-HSA-450307\tmember/candidate\tmember/candidate"/> <has_n_columns n="6" /> </assert_contents> </element> <element name="proteoformVertices" ftype="tsv"> <assert_contents> <has_line_matching expression="P16220;00046:133\tCyclic AMP-responsive element-binding protein 1 shortName:CREB-1 shortName:cAMP-responsive element-binding protein 1 ]"/> <has_line_matching expression="Q15759;00047:180,00048:182\tMitogen-activated protein kinase 11 shortName:MAP kinase 11 shortName:MAPK 11 ecNumber2.7.11.24/ecNumber"/> <has_line_matching expression="P42229;00048:694\tSignal transducer and activator of transcription 5A ]"/> <has_n_columns n="2" /> </assert_contents> </element> </output_collection> </test> </tests> <help> .. class:: infomark **Introduction** Biological pathways are an excellent resource to analyze the causes and consequences of certain phenotypes. Most of the components of the pathways are proteins. When searching for relevant pathways to perform analysis of a patient sample proteins, it is very common to lose information due to lack of precision in the search. This leads to result sets with many extra selected pathways that are not really related to the input sample. .. class:: infomark **What it does** We present more fine grained approach to search, not only with the gene names, but also with post translational modifications of the proteins, such as phosphorylation. Ultimately, any omics dataset with its mutations and modifications will be mapped directly to the functional knowledgebases allowing the functional interpretation by researchers and clinicians. The reference database used is Reactome, a free, open source, curated and peer reviewed database of biological reactions, that contains the quality data needed for this type of fine grained search. database of biological reactions. It can be readily queried with omics datasets, and we are improving its features by extending the matching the clinical data to the biological pathways. Not only will the gene names be used, but also mutations or post translational modifications such as phosphorylation. .. class:: infomark **Inputs and outputs** PathwayMatcher can search for reactions and pathways with various input types, and generates mapping files to the database. The input can be: - Genetic variants - Genes - Peptides - Protein - Proteoforms The output of PathwayMatcher is composed of two files, the Reaction and Pathway mapping and the statistical analysis of the relevant pathways. .. class:: infomark **Try it now** You can easily test PathwayMatcher functionality creating text files with the example data we provide with proteoforms and proteins information of Cystic Fibrosis: **Proteoforms**: :: P01137; P10145; P12318; P12318;00048:288,00048:304 P13569; P13569;01148:null P19438; P37088; P37088;01148:null P51168; P51170; P51170;01148:null Q14CN2; Q16623; Q9UJW0; **Proteins**: :: P13569 P01137 P12318 Q9UJW0 P51168 P51170 P37088 P19438 Q14CN2 Q16623 P10145 After copying and pasting this data into new text files, you can upload them to Galaxy by directly using the Galaxy upload dialog (the button with the arrow pointing up in the top-left area, and then choosing *Choose local file*). Once they appear in green in your history, they have been uploaded and you can use them as inputs in PathwayMatcher. Now, select *Proteoforms* as input type and, in the next drop-down list, the entry of your history corresponding to the proteoforms file you uploaded. Then, click on *insert input* to add a new PathwayMatcher input for the proteins. Select *Proteins - Uniprot accession list* as input type and, in the next drop-down list, the entry in your history corresponding to this proteins file. **Everything is ready**. By clicking on *Execute* PathwayMatcher will run with your chosen inputs and, after a short time, it will show in your history two more files with the search results and its analysis. .. class:: infomark Information included with this tool is a brief summary of the main one included in PathwayMatcher_. Specific information about PathwayMatcher's Input_ and Output_ may also be found there. .. class:: infomark **References** [1] dbSNP_ [2] VCF v4.3: http://samtools.github.io/hts-specs/VCFv4.3.pdf [3] genenames.org: the HGNC resources in 2015. Nucleic Acids Res. 2015 Jan;43(Database issue):D1079-85. doi: 10.1093/nar/gku1071. : https://www.ncbi.nlm.nih.gov/pubmed/25361968 [4] UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45: D158-D169 (2017): http://dx.doi.org/doi:10.1093/nar/gkw1099 [5] Ensembl: https://www.ensembl.org/info/genome/stable_ids/index.html [6] The PSI-MOD community standard for representation of protein modification data. Nature Biotechnology 26, 864 - 866 (2008): http://www.nature.com/nbt/journal/v26/n8/full/nbt0808-864.html .. _dbSNP: https://www.ncbi.nlm.nih.gov/projects/SNP/ .. _PathwayMatcher: https://github.com/PathwayAnalysisPlatform/PathwayMatcher .. _Input: https://github.com/PathwayAnalysisPlatform/PathwayMatcher/wiki/Input .. _Output: https://github.com/PathwayAnalysisPlatform/PathwayMatcher/wiki/Output </help> <citations> <citation type="doi">doi:10.1093/gigascience/giz088</citation> </citations> </tool>