# HG changeset patch # User peterjc # Date 1379499418 14400 # Node ID 7de64c8b258d5bb25b03a0ae60841bc051b0efdc # Parent 6abd809cefddd3190e0238d04252045759a87a7e Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub diff -r 6abd809cefdd -r 7de64c8b258d test-data/empty_rxlr.Bhattacharjee2006.tabular --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/empty_rxlr.Bhattacharjee2006.tabular Wed Sep 18 06:16:58 2013 -0400 @@ -0,0 +1,1 @@ +#ID Bhattacharjee2006 diff -r 6abd809cefdd -r 7de64c8b258d test-data/empty_rxlr.Whisson2007.tabular --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/empty_rxlr.Whisson2007.tabular Wed Sep 18 06:16:58 2013 -0400 @@ -0,0 +1,1 @@ +#ID Whisson2007 diff -r 6abd809cefdd -r 7de64c8b258d test-data/empty_rxlr.Win2007.tabular --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/empty_rxlr.Win2007.tabular Wed Sep 18 06:16:58 2013 -0400 @@ -0,0 +1,1 @@ +#ID Win2007 diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/LICENSE --- a/tools/protein_analysis/LICENSE Thu Apr 25 12:25:52 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,26 +0,0 @@ -These wrappers are copyright 2010-2013 by Peter Cock, James Hutton Institute -(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. -Contributions/revisions copyright 2011 Konrad Paszkiewicz. All rights reserved. - -License for TMHMM 2.0, SignalP 3.0, WoLF PSORT and PSORTb wrappers for -Galaxy (note that tools themselves are copyright and licensed separately) -and the RXLR motif tool for Galaxy. - -Permission to use, copy, modify, and distribute this software and its -documentation with or without modifications and for any purpose and -without fee is hereby granted, provided that any copyright notices -appear in all copies and that both those copyright notices and this -permission notice appear in supporting documentation, and that the -names of the contributors or copyright holders not be used in -advertising or publicity pertaining to distribution of the software -without specific prior permission. - -THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL -WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED -WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE -CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT -OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS -OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE -OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE -OR PERFORMANCE OF THIS SOFTWARE. - diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/LICENSE.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/protein_analysis/LICENSE.txt Wed Sep 18 06:16:58 2013 -0400 @@ -0,0 +1,28 @@ +These wrappers are copyright 2010-2013 by Peter Cock, James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +Contributions/revisions copyright 2011 Konrad Paszkiewicz. All rights reserved. + +License for TMHMM 2.0, SignalP 3.0, WoLF PSORT and PSORTb wrappers for +Galaxy (note that tools themselves are copyright and licensed separately) +and the RXLR motif tool for Galaxy. + +Licence (MIT) +============= + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/README --- a/tools/protein_analysis/README Thu Apr 25 12:25:52 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,163 +0,0 @@ -This package contains Galaxy wrappers for a selection of standalone command -line protein analysis tools: - -* SignalP 3.0, THMHMM 2.0, Promoter 2.0 from the Center for Biological - Sequence Analysis at the Technical University of Denmark, - http://www.cbs.dtu.dk/cbs/ - -* WoLF PSORT v0.2 from http://wolfpsort.org/ - -* PSORTb v3 from http://www.psort.org/downloads/index.html - -Also, the RXLR motif tool uses SignalP 3.0 and HMMER 2.3.2 internally. - -To use these Galaxy wrappers you must first install the command line tools. -At the time of writing they are all free for academic use, or open source. - -These wrappers are copyright 2010-2013 by Peter Cock, James Hutton Institute -(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. -Contributions/revisions copyright 2011 Konrad Paszkiewicz. All rights reserved. -See the included LICENCE file for details (an MIT style open source licence). - - -Requirements -============ - -First install those command line tools you wish to use the wrappers for: - -1. Install the command line version of SignalP 3.0 and ensure "signalp" is - on the PATH, see: http://www.cbs.dtu.dk/services/SignalP/ - -2. Install the command line version of TMHMM 2.0 and ensure "tmhmm" is on - the PATH, see: http://www.cbs.dtu.dk/services/TMHMM/ - -3. Install the command line version of Promoter 2.0 and ensure "promoter" is - on the PATH, see: http://www.cbs.dtu.dk/services/Promoter - -4. Install the WoLF PSORT v0.2 package, and ensure "runWolfPsortSummary" - is on the PATH (we use an extra wrapper script to change to the WoLF PSORT - directory, run runWolfPsortSummary, and then change back to the original - directory), see: http://wolfpsort.org/WoLFPSORT_package/version0.2/ - -5. Install hmmsearch from HMMER 2.3.2 (the last stable release of HMMER 2) - but put it on the path under the name hmmsearch2 (allowing it to co-exist - with HMMER 3), or edit rlxr_motif.py accordingly. - -Verify each of the tools is installed and working from the command line -(when logged in as the Galaxy user if appropriate). - - -Manual Installation -=================== - -1. Create a folder tools/protein_analysis under your Galaxy installation. - This folder name is not critical, and can be changed if desired - you - must update the paths used in tool_conf.xml to match. - -2. Copy/move the following files (from this archive) there: - -tmhmm2.xml (Galaxy tool definition) -tmhmm2.py (Python wrapper script) - -signalp3.xml (Galaxy tool definition) -signalp3.py (Python wrapper script) - -promoter2.xml (Galaxy tool definition) -promoter2.py (Python wrapper script) - -psortb.xml (Galaxy tool definition) -psortb.py (Python wrapper script) - -wolf_psort.xml (Galaxy tool definition) -wolf_psort.py (Python wrapper script) - -rxlr_motifs.xml (Galaxy tool definition) -rxlr_motifs.py (Python script) - -seq_analysis_utils.py (shared Python code) -LICENCE -README (this file) - -3. Edit your Galaxy conjuration file tool_conf.xml (to use the tools) AND - also tool_conf.xml.sample (to run the tests) to include the new tools - by adding: - -
- - - - - -
-
- -
- - Leave out the lines for any tools you do not wish to use in Galaxy. - -4. Copy/move the test-data files (from this archive) to Galaxy's - subfolder test-data. - -5. Run the Galaxy functional tests for these new wrappers with: - -./run_functional_tests.sh -id tmhmm2 -./run_functional_tests.sh -id signalp3 -./run_functional_tests.sh -id Psortb -./run_functional_tests.sh -id rxlr_motifs - -Alternatively, this should work (assuming you left the name and id as shown in -the XML file tool_conf.xml.sample): - -./run_functional_tests.sh -sid Protein_sequence_analysis-protein_analysis - -To check the section ID expected, use ./run_functional_tests.sh -list - -6. Restart Galaxy and check the new tools are shown and work. - - -History -======= - -v0.0.1 - Initial release -v0.0.2 - Corrected some typos in the help text - - Renamed test output file to use Galaxy convention of *.tabular -v0.0.3 - Check for tmhmm2 silent failures (no output) - - Additional unit tests -v0.0.4 - Ignore comment lines in tmhmm2 output. -v0.0.5 - Explicitly request tmhmm short output (may not be the default) -v0.0.6 - Improvement to how sub-jobs are run (should be faster) -v0.0.7 - Change SignalP default truncation from 60 to 70 to match the - SignalP webservice. -v0.0.8 - Added WoLF PSORT wrapper to the suite. -v0.0.9 - Added our RXLR motifs tool to the suite. -v0.1.0 - Added Promoter 2.0 wrapper (similar to SignalP & TMHMM wrappers) - - Support Galaxy's tag for SignalP, TMHMM & Promoter -v0.1.1 - Fixed an error in the header of the tabular output from Promoter -v0.1.2 - Use the new settings in the XML wrappers to catch errors - - Use SGE style $NSLOTS for thread count (otherwise default to 4) -v0.1.3 - Added missing file whisson_et_al_rxlr_eer_cropped.hmm to Tool Shed -v0.2.0 - Added PSORTb wrapper to the suite, based on earlier work - contributed by Konrad Paszkiewicz. -v0.2.1 - Use a script to create the Tool Shed tar-ball (removed some stray - files accidentally included previously via a wildcard). -v0.2.2 - Include missing test files. -v0.2.3 - Added unit tests for WoLF PSORT. -v0.2.4 - Added unit tests for Promoter 2 - - -Developers -========== - -This script and other tools are being developed on the following hg branch: -http://bitbucket.org/peterjc/galaxy-central/src/tools - -This incorporates the previously used hg branch: -http://bitbucket.org/peterjc/galaxy-central/src/seq_analysis - -For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball use -the following command from the Galaxy root folder: - -$ ./tools/protein_analysis/make_tmhmm_and_signalp.sh - -This simplifies ensuring a consistent set of files is bundled each time, -including all the relevant test files. diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/README.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/protein_analysis/README.rst Wed Sep 18 06:16:58 2013 -0400 @@ -0,0 +1,192 @@ +This package contains Galaxy wrappers for a selection of standalone command +line protein analysis tools: + +* SignalP 3.0, THMHMM 2.0, Promoter 2.0 from the Center for Biological + Sequence Analysis at the Technical University of Denmark, + http://www.cbs.dtu.dk/cbs/ + +* WoLF PSORT v0.2 from http://wolfpsort.org/ + +* PSORTb v3 from http://www.psort.org/downloads/index.html + +Also, the RXLR motif tool uses SignalP 3.0 and HMMER 2.3.2 internally. + +To use these Galaxy wrappers you must first install the command line tools. +At the time of writing they are all free for academic use, or open source. + +These wrappers are copyright 2010-2013 by Peter Cock, James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +Contributions/revisions copyright 2011 Konrad Paszkiewicz. All rights reserved. +See the included LICENCE file for details (MIT open source licence). + +The wrappers are available from the Galaxy Tool Shed +http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp + +Citation +======== + +If you use any of these Galaxy tools in work leading to a scientific +publication, in addition to citing the invididual underlying tools, please cite: + +Peter Cock, Bjoern Gruening, Konrad Paszkiewicz and Leighton Pritchard (2013). +Galaxy tools and workflows for sequence analysis with applications +in molecular plant pathology. PeerJ 1:e167 +http://dx.doi.org/10.7717/peerj.167 + +Full reference information is included in the help text for each tool. + + +Requirements +============ + +First install those command line tools you wish to use the wrappers for: + +1. Install the command line version of SignalP 3.0 and ensure "signalp" is + on the PATH, see: http://www.cbs.dtu.dk/services/SignalP/ + +2. Install the command line version of TMHMM 2.0 and ensure "tmhmm" is on + the PATH, see: http://www.cbs.dtu.dk/services/TMHMM/ + +3. Install the command line version of Promoter 2.0 and ensure "promoter" is + on the PATH, see: http://www.cbs.dtu.dk/services/Promoter + +4. Install the WoLF PSORT v0.2 package, and ensure "runWolfPsortSummary" + is on the PATH (we use an extra wrapper script to change to the WoLF PSORT + directory, run runWolfPsortSummary, and then change back to the original + directory), see: http://wolfpsort.org/WoLFPSORT_package/version0.2/ + +5. Install hmmsearch from HMMER 2.3.2 (the last stable release of HMMER 2) + but put it on the path under the name hmmsearch2 (allowing it to co-exist + with HMMER 3), or edit rlxr_motif.py accordingly. + +Verify each of the tools is installed and working from the command line +(when logged in as the Galaxy user if appropriate). + + +Manual Installation +=================== + +1. Create a folder tools/protein_analysis under your Galaxy installation. + This folder name is not critical, and can be changed if desired - you + must update the paths used in tool_conf.xml to match. + +2. Copy/move the following files (from this archive) there: + + * tmhmm2.xml (Galaxy tool definition) + * tmhmm2.py (Python wrapper script) + + * signalp3.xml (Galaxy tool definition) + * signalp3.py (Python wrapper script) + + * promoter2.xml (Galaxy tool definition) + * promoter2.py (Python wrapper script) + + * psortb.xml (Galaxy tool definition) + * psortb.py (Python wrapper script) + + * wolf_psort.xml (Galaxy tool definition) + * wolf_psort.py (Python wrapper script) + + * rxlr_motifs.xml (Galaxy tool definition) + * rxlr_motifs.py (Python script) + + * seq_analysis_utils.py (shared Python code) + * LICENCE + * README.rst (this file) + +3. Edit your Galaxy conjuration file tool_conf.xml (to use the tools) AND + also tool_conf.xml.sample (to run the tests) to include the new tools + by adding:: + +
+ + + + + +
+
+ +
+ + Leave out the lines for any tools you do not wish to use in Galaxy. + +4. Copy/move the test-data files (from this archive) to Galaxy's + subfolder test-data. + +5. Run the Galaxy functional tests for these new wrappers with:: + + ./run_functional_tests.sh -id tmhmm2 + ./run_functional_tests.sh -id signalp3 + ./run_functional_tests.sh -id Psortb + ./run_functional_tests.sh -id rxlr_motifs + + Alternatively, this should work (assuming you left the name and id as shown in + the XML file tool_conf.xml.sample):: + + ./run_functional_tests.sh -sid Protein_sequence_analysis-protein_analysis + + To check the section ID expected, use ./run_functional_tests.sh -list + +6. Restart Galaxy and check the new tools are shown and work. + + +History +======= + +======= ====================================================================== +Version Changes +------- ---------------------------------------------------------------------- +v0.0.1 - Initial release +v0.0.2 - Corrected some typos in the help text + - Renamed test output file to use Galaxy convention of *.tabular +v0.0.3 - Check for tmhmm2 silent failures (no output) + - Additional unit tests +v0.0.4 - Ignore comment lines in tmhmm2 output. +v0.0.5 - Explicitly request tmhmm short output (may not be the default) +v0.0.6 - Improvement to how sub-jobs are run (should be faster) +v0.0.7 - Change SignalP default truncation from 60 to 70 to match the + SignalP webservice. +v0.0.8 - Added WoLF PSORT wrapper to the suite. +v0.0.9 - Added our RXLR motifs tool to the suite. +v0.1.0 - Added Promoter 2.0 wrapper (similar to SignalP & TMHMM wrappers) + - Support Galaxy's tag for SignalP, TMHMM & Promoter +v0.1.1 - Fixed an error in the header of the tabular output from Promoter +v0.1.2 - Use the new settings in the XML wrappers to catch errors + - Use SGE style $NSLOTS for thread count (otherwise default to 4) +v0.1.3 - Added missing file whisson_et_al_rxlr_eer_cropped.hmm to Tool Shed +v0.2.0 - Added PSORTb wrapper to the suite, based on earlier work + contributed by Konrad Paszkiewicz. +v0.2.1 - Use a script to create the Tool Shed tar-ball (removed some stray + files accidentally included previously via a wildcard). +v0.2.2 - Include missing test files. +v0.2.3 - Added unit tests for WoLF PSORT. +v0.2.4 - Added unit tests for Promoter 2 +v0.2.5 - Link to Tool Shed added to help text and this documentation. + - More unit tests. + - Fixed bug with RXLR tool and empty FASTA files. + - Fixed typo in the RXLR tool help text. + - Updated citation information (Cock et al. 2013). + - Adopted standard MIT licence. + - Use reStructuredText for this README file. + - Development moved to GitHub, https://github.com/peterjc/pico_galaxy +======= ====================================================================== + + +Developers +========== + +This script and other tools are being developed on the following hg branches: +http://bitbucket.org/peterjc/galaxy-central/src/seq_analysis +http://bitbucket.org/peterjc/galaxy-central/src/tools + +Development has now moved to a dedicated GitHub repository: +https://github.com/peterjc/pico_galaxy/tree/master/tools + +For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball use +the following command from the Galaxy root folder:: + + $ ./tools/protein_analysis/make_tmhmm_and_signalp.sh + +This simplifies ensuring a consistent set of files is bundled each time, +including all the relevant test files. diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/promoter2.xml --- a/tools/protein_analysis/promoter2.xml Thu Apr 25 12:25:52 2013 -0400 +++ b/tools/protein_analysis/promoter2.xml Wed Sep 18 06:16:58 2013 -0400 @@ -1,13 +1,15 @@ - + Find eukaryotic PolII promoters in DNA sequences promoter2.py "\$NSLOTS" $fasta_file $tabular_file + ##I want the number of threads to be a Galaxy config option... ##Set the number of threads in the runner entry in universe_wsgi.ini ##which (on SGE at least) will set the $NSLOTS environment variable. - ##If the environment variable isn't set, get "", and defaults to one. + ##If the environment variable isn't set, get "", and the python wrapper + ##defaults to four threads. @@ -41,10 +43,14 @@ The input is a FASTA file of nucleotide sequences (e.g. upstream regions of your genes), and the output is tabular with five columns (one row per promoter): - 1. Sequence identifier (first word of FASTA header) - 2. Promoter position, e.g. 600 - 3. Promoter score, e.g. 1.063 - 4. Promoter likelihood, e.g. Highly likely prediction +====== ================================================== +Column Description +------ -------------------------------------------------- + 1 Sequence identifier (first word of FASTA header) + 2 Promoter position, e.g. 600 + 3 Promoter score, e.g. 1.063 + 4 Promoter likelihood, e.g. Highly likely prediction +====== ================================================== The scores are classified very simply as follows: @@ -61,12 +67,22 @@ **References** -Knudsen. +If you use this Galaxy tool in work leading to a scientific publication please +cite the following papers: + +Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013). +Galaxy tools and workflows for sequence analysis with applications +in molecular plant pathology. PeerJ 1:e167 +http://dx.doi.org/10.7717/peerj.167 + +Steen Knudsen (1999). Promoter2.0: for the recognition of PolII promoter sequences. -Bioinformatics, 15:356-61, 1999. +Bioinformatics, 15:356-61. http://dx.doi.org/10.1093/bioinformatics/15.5.356 -http://www.cbs.dtu.dk/services/Promoter/output.php +See also http://www.cbs.dtu.dk/services/Promoter/output.php +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/psortb.xml --- a/tools/protein_analysis/psortb.xml Thu Apr 25 12:25:52 2013 -0400 +++ b/tools/protein_analysis/psortb.xml Wed Sep 18 06:16:58 2013 -0400 @@ -1,10 +1,17 @@ - + Determines sub-cellular localisation of bacterial/archaeal protein sequences psortb.py --version - psortb.py "\$NSLOTS" "$type" "$long" "$cutoff" "$divergent" "$sequence" "$outfile" + + psortb.py "\$NSLOTS" "$type" "$long" "$cutoff" "$divergent" "$sequence" "$outfile" + ##I want the number of threads to be a Galaxy config option... + ##Set the number of threads in the runner entry in universe_wsgi.ini + ##which (on SGE at least) will set the $NSLOTS environment variable. + ##If the environment variable isn't set, get "", and python wrapper + ##defaults to four threads. + @@ -76,6 +83,14 @@ **References** +If you use this Galaxy tool in work leading to a scientific publication please +cite the following papers: + +Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013). +Galaxy tools and workflows for sequence analysis with applications +in molecular plant pathology. PeerJ 1:e167 +http://dx.doi.org/10.7717/peerj.167 + N.Y. Yu, J.R. Wagner, M.R. Laird, G. Melli, S. Rey, R. Lo, P. Dao, S.C. Sahinalp, M. Ester, L.J. Foster, F.S.L. Brinkman (2010) PSORTb 3.0: Improved protein subcellular localization prediction with @@ -83,7 +98,9 @@ prokaryotes, Bioinformatics 26(13):1608-1615 http://dx.doi.org/10.1093/bioinformatics/btq249 -http://www.psort.org/documentation/index.html +See also http://www.psort.org/documentation/index.html +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/rxlr_motifs.py --- a/tools/protein_analysis/rxlr_motifs.py Thu Apr 25 12:25:52 2013 -0400 +++ b/tools/protein_analysis/rxlr_motifs.py Wed Sep 18 06:16:58 2013 -0400 @@ -111,25 +111,7 @@ stop_err("Missing HMM file for Whisson et al. (2007)") if not get_hmmer_version(hmmer_search, "HMMER 2.3.2 (Oct 2003)"): stop_err("Missing HMMER 2.3.2 (Oct 2003) binary, %s" % hmmer_searcher) - #I've left the code to handle HMMER 3 in situ, in case - #we revisit the choice to insist on HMMER 2. - hmmer3 = (3 == get_hmmer_version(hmmer_search)) - #Using zero (or 5.6?) for bitscore threshold - if hmmer3: - #The HMMER3 table output is easy to parse - #In HMMER3 can't use both -T and -E - cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" \ - % (hmmer_search, hmm_output_file, hmm_file, fasta_file) - else: - #For HMMER2 we are stuck with parsing stdout - #Put 1e6 to effectively have no expectation threshold (otherwise - #HMMER defaults to 10 and the calculated e-value depends on the - #input FASTA file, and we can loose hits of interest). - cmd = "%s -T 0 -E 1e6 %s %s > %s" \ - % (hmmer_search, hmm_file, fasta_file, hmm_output_file) - return_code = os.system(cmd) - if return_code: - stop_err("Error %i from hmmsearch:\n%s" % (return_code, cmd)) + hmm_hits = set() valid_ids = set() for title, seq in fasta_iterator(fasta_file): @@ -138,29 +120,53 @@ stop_err("Duplicated identifier %r" % name) else: valid_ids.add(name) - handle = open(hmm_output_file) - for line in handle: - if not line.strip(): - #We expect blank lines in the HMMER2 stdout - continue - elif line.startswith("#"): - #Header - continue + if not valid_ids: + #Special case, don't need to run HMMER if there are no sequences + pass + else: + #I've left the code to handle HMMER 3 in situ, in case + #we revisit the choice to insist on HMMER 2. + hmmer3 = (3 == get_hmmer_version(hmmer_search)) + #Using zero (or 5.6?) for bitscore threshold + if hmmer3: + #The HMMER3 table output is easy to parse + #In HMMER3 can't use both -T and -E + cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" \ + % (hmmer_search, hmm_output_file, hmm_file, fasta_file) else: - name = line.split(None,1)[0] - #Should be a sequence name in the HMMER3 table output. - #Could be anything in the HMMER2 stdout. - if name in valid_ids: - hmm_hits.add(name) - elif hmmer3: - stop_err("Unexpected identifer %r in hmmsearch output" % name) - handle.close() - #if hmmer3: - # print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids)) - #else: - # print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids)) - #print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids)) - os.remove(hmm_output_file) + #For HMMER2 we are stuck with parsing stdout + #Put 1e6 to effectively have no expectation threshold (otherwise + #HMMER defaults to 10 and the calculated e-value depends on the + #input FASTA file, and we can loose hits of interest). + cmd = "%s -T 0 -E 1e6 %s %s > %s" \ + % (hmmer_search, hmm_file, fasta_file, hmm_output_file) + return_code = os.system(cmd) + if return_code: + stop_err("Error %i from hmmsearch:\n%s" % (return_code, cmd), return_code) + + handle = open(hmm_output_file) + for line in handle: + if not line.strip(): + #We expect blank lines in the HMMER2 stdout + continue + elif line.startswith("#"): + #Header + continue + else: + name = line.split(None,1)[0] + #Should be a sequence name in the HMMER3 table output. + #Could be anything in the HMMER2 stdout. + if name in valid_ids: + hmm_hits.add(name) + elif hmmer3: + stop_err("Unexpected identifer %r in hmmsearch output" % name) + handle.close() + #if hmmer3: + # print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids)) + #else: + # print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids)) + #print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids)) + os.remove(hmm_output_file) del valid_ids diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/rxlr_motifs.xml --- a/tools/protein_analysis/rxlr_motifs.xml Thu Apr 25 12:25:52 2013 -0400 +++ b/tools/protein_analysis/rxlr_motifs.xml Wed Sep 18 06:16:58 2013 -0400 @@ -1,4 +1,4 @@ - + Find RXLR Effectors of Plant Pathogenic Oomycetes rxlr_motifs.py $fasta_file 8 $model $tabular_file @@ -32,6 +32,21 @@ + + + + + + + + + + + + + + + @@ -59,9 +74,9 @@ Looks for the oomycete motif RXLR as described in Bhattacharjee et al. (2006). Matches must have a SignalP Hidden Markov Model (HMM) score of at least 0.9, -a SignalP Neural Network (NN) predicted clevage site giving a signal peptide +a SignalP Neural Network (NN) predicted cleavage site giving a signal peptide length between 10 and 40 amino acids inclusive, and the RXLR pattern must be -after but within 100 amino acids of the clevage site. +after but within 100 amino acids of the cleavage site. SignalP is run truncating the sequences to the first 70 amino acids, which was the default on the SignalP webservice used in Bhattacharjee et al. (2006). @@ -71,9 +86,9 @@ Looks for the protein motif RXLR as described in Win et al. (2007). Matches must have a SignalP Hidden Markov Model (HMM) score of at least 0.9, -a SignalP Neural Network (NN) predicted clevage site giving a signal peptide +a SignalP Neural Network (NN) predicted cleavage site giving a signal peptide length between 10 and 40 amino acids inclusive, and the RXLR pattern must be -after the clevage site and start between amino acids 30 and 60. +after the cleavage site and start between amino acids 30 and 60. SignalP is run truncating the sequences to the first 70 amino acids, to match the methodology of Torto et al. (2003) followed in Win et al. (2007). @@ -120,35 +135,45 @@ **References** -Stephen C. Whisson, Petra C. Boevink, Lucy Moleleki, Anna O. Avrova, Juan G. Morales, Eleanor M. Gilroy, Miles R. Armstrong, Severine Grouffaud, Pieter van West, Sean Chapman, Ingo Hein, Ian K. Toth, Leighton Pritchard and Paul R. J. Birch +If you use this Galaxy tool in work leading to a scientific publication please +cite Cock et al. (2013) and the appropriate method paper(s): + +Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013). +Galaxy tools and workflows for sequence analysis with applications +in molecular plant pathology. PeerJ 1:e167 +http://dx.doi.org/10.7717/peerj.167 + +Stephen C. Whisson, Petra C. Boevink, Lucy Moleleki, Anna O. Avrova, Juan G. Morales, Eleanor M. Gilroy, Miles R. Armstrong, Severine Grouffaud, Pieter van West, Sean Chapman, Ingo Hein, Ian K. Toth, Leighton Pritchard and Paul R. J. Birch (2007). A translocation signal for delivery of oomycete effector proteins into host plant cells. -Nature 450:115-118, 2007. +Nature 450:115-118. http://dx.doi.org/10.1038/nature06203 -Joe Win, William Morgan, Jorunn Bos, Ksenia V. Krasileva, Liliana M. Cano, Angela Chaparro-Garcia, Randa Ammar, Brian J. Staskawicz and Sophien Kamoun. +Joe Win, William Morgan, Jorunn Bos, Ksenia V. Krasileva, Liliana M. Cano, Angela Chaparro-Garcia, Randa Ammar, Brian J. Staskawicz and Sophien Kamoun (2007). Adaptive evolution has targeted the C-terminal domain of the RXLR effectors of plant pathogenic oomycetes. -The Plant Cell 19:2349-2369, 2007. +The Plant Cell 19:2349-2369. http://dx.doi.org/10.1105/tpc.107.051037 -Souvik Bhattacharjee, N. Luisa Hiller, Konstantinos Liolios, Joe Win, Thirumala-Devi Kanneganti, Carolyn Young, Sophien Kamoun and Kasturi Haldar. +Souvik Bhattacharjee, N. Luisa Hiller, Konstantinos Liolios, Joe Win, Thirumala-Devi Kanneganti, Carolyn Young, Sophien Kamoun and Kasturi Haldar (2006). The malarial host-targeting signal is conserved in the Irish potato famine pathogen. -PLoS Pathogens, 2(5):e50, 2006. +PLoS Pathogens, 2(5):e50. http://dx.doi.org/10.1371/journal.ppat.0020050 -Trudy A. Torto, Shuang Li, Allison Styer, Edgar Huitema, Antonino Testa, Neil A.R. Gow, Pieter van West and Sophien Kamoun. +Trudy A. Torto, Shuang Li, Allison Styer, Edgar Huitema, Antonino Testa, Neil A.R. Gow, Pieter van West and Sophien Kamoun (2003). EST mining and functional expression assays identify extracellular effector proteins from the plant pathogen *phytophthora*. -Genome Research, 13:1675-1685, 2003. +Genome Research, 13:1675-1685. http://dx.doi.org/10.1101/gr.910003 -Sean R. Eddy. +Sean R. Eddy (1998). Profile hidden Markov models. -Bioinformatics, 14(9):755–763, 1998 +Bioinformatics, 14(9):755–763. http://dx.doi.org/10.1093/bioinformatics/14.9.755 -Nielsen, Engelbrecht, Brunak and von Heijne. +Nielsen, Engelbrecht, Brunak and von Heijne (1997). Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. -Protein Engineering, 10:1-6, 1997. +Protein Engineering, 10:1-6. http://dx.doi.org/10.1093/protein/10.1.1 +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/signalp3.xml --- a/tools/protein_analysis/signalp3.xml Thu Apr 25 12:25:52 2013 -0400 +++ b/tools/protein_analysis/signalp3.xml Wed Sep 18 06:16:58 2013 -0400 @@ -1,4 +1,4 @@ - + Find signal peptides in protein sequences @@ -7,7 +7,8 @@ signalp3.py $organism $truncate "\$NSLOTS" $fasta_file $tabular_file ##Set the number of threads in the runner entry in universe_wsgi.ini ##which (on SGE at least) will set the $NSLOTS environment variable. - ##If the environment variable isn't set, get "", and defaults to one. + ##If the environment variable isn't set, get "", and the python wrapper + ##defaults to four threads. @@ -167,23 +168,33 @@ **References** -Bendtsen, Nielsen, von Heijne, and Brunak. +If you use this Galaxy tool in work leading to a scientific publication please +cite the following papers: + +Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013). +Galaxy tools and workflows for sequence analysis with applications +in molecular plant pathology. PeerJ 1:e167 +http://dx.doi.org/10.7717/peerj.167 + +Bendtsen, Nielsen, von Heijne, and Brunak (2004). Improved prediction of signal peptides: SignalP 3.0. -J. Mol. Biol., 340:783-795, 2004. +J. Mol. Biol., 340:783-795. http://dx.doi.org/10.1016/j.jmb.2004.05.028 -Nielsen, Engelbrecht, Brunak and von Heijne. +Nielsen, Engelbrecht, Brunak and von Heijne (1997). Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. -Protein Engineering, 10:1-6, 1997. +Protein Engineering, 10:1-6. http://dx.doi.org/10.1093/protein/10.1.1 -Nielsen and Krogh. +Nielsen and Krogh (1998). Prediction of signal peptides and signal anchors by a hidden Markov model. Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology (ISMB 6), -AAAI Press, Menlo Park, California, pp. 122-130, 1998. +AAAI Press, Menlo Park, California, pp. 122-130. http://www.ncbi.nlm.nih.gov/pubmed/9783217 -http://www.cbs.dtu.dk/services/SignalP-3.0/output.php +See also http://www.cbs.dtu.dk/services/SignalP-3.0/output.php +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/tmhmm2.xml --- a/tools/protein_analysis/tmhmm2.xml Thu Apr 25 12:25:52 2013 -0400 +++ b/tools/protein_analysis/tmhmm2.xml Wed Sep 18 06:16:58 2013 -0400 @@ -1,13 +1,15 @@ - + Find transmembrane domains in protein sequences tmhmm2.py "\$NSLOTS" $fasta_file $tabular_file + ##I want the number of threads to be a Galaxy config option... ##Set the number of threads in the runner entry in universe_wsgi.ini ##which (on SGE at least) will set the $NSLOTS environment variable. - ##If the environment variable isn't set, get "", and defaults to one. + ##If the environment variable isn't set, get "", and the python wrapper + ##defaults to four threads. @@ -94,17 +96,27 @@ **References** -Krogh, Larsson, von Heijne, and Sonnhammer. +If you use this Galaxy tool in work leading to a scientific publication please +cite the following papers: + +Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013). +Galaxy tools and workflows for sequence analysis with applications +in molecular plant pathology. PeerJ 1:e167 +http://dx.doi.org/10.7717/peerj.167 + +Krogh, Larsson, von Heijne, and Sonnhammer (2001). Predicting Transmembrane Protein Topology with a Hidden Markov Model: Application to Complete Genomes. -J. Mol. Biol. 305:567-580, 2001. +J. Mol. Biol. 305:567-580. http://dx.doi.org/10.1006/jmbi.2000.4315 -Sonnhammer, von Heijne, and Krogh. +Sonnhammer, von Heijne, and Krogh (1998). A hidden Markov model for predicting transmembrane helices in protein sequences. -In J. Glasgow et al., eds.: Proc. Sixth Int. Conf. on Intelligent Systems for Molecular Biology, pages 175-182. AAAI Press, 1998. +In J. Glasgow et al., eds.: Proc. Sixth Int. Conf. on Intelligent Systems for Molecular Biology, pages 175-182. AAAI Press. http://www.ncbi.nlm.nih.gov/pubmed/9783223 -http://www.cbs.dtu.dk/services/TMHMM/ +See also http://www.cbs.dtu.dk/services/TMHMM/ +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp diff -r 6abd809cefdd -r 7de64c8b258d tools/protein_analysis/wolf_psort.xml --- a/tools/protein_analysis/wolf_psort.xml Thu Apr 25 12:25:52 2013 -0400 +++ b/tools/protein_analysis/wolf_psort.xml Wed Sep 18 06:16:58 2013 -0400 @@ -1,8 +1,12 @@ - + Eukaryote protein subcellular localization prediction - wolf_psort.py $organism 8 $fasta_file $tabular_file + wolf_psort.py $organism "\$NSLOTS" "$fasta_file" "$tabular_file" ##I want the number of threads to be a Galaxy config option... + ##Set the number of threads in the runner entry in universe_wsgi.ini + ##which (on SGE at least) will set the $NSLOTS environment variable. + ##If the environment variable isn't set, get "", and python wrapper + ##defaults to four threads. @@ -84,6 +88,11 @@ vacu vacuolar membrane 0005774(2) ====== ===================== ===================== +Numbers in parentheses, such as "0005856(2)" indicate that descendant "part_of" +cellular components were also included, up to the specified depth (2 in this case). +For example, all of the children and grandchildren of "GO:0005856" were +included as "cysk". + Additionally compound predictions like mito_nucl are also given. @@ -119,16 +128,26 @@ **References** -Paul Horton, Keun-Joon Park, Takeshi Obayashi, Naoya Fujita, Hajime Harada, C.J. Adams-Collier, and Kenta Nakai, +If you use this Galaxy tool in work leading to a scientific publication please +cite the following papers: + +Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013). +Galaxy tools and workflows for sequence analysis with applications +in molecular plant pathology. PeerJ 1:e167 +http://dx.doi.org/10.7717/peerj.167 + +Paul Horton, Keun-Joon Park, Takeshi Obayashi, Naoya Fujita, Hajime Harada, C.J. Adams-Collier, and Kenta Nakai (2007). WoLF PSORT: Protein Localization Predictor. -Nucleic Acids Research, 35(S2), W585-W587, 2007. +Nucleic Acids Research, 35(S2), W585-W587. http://dx.doi.org/10.1093/nar/gkm259 -Paul Horton, Keun-Joon Park, Takeshi Obayashi and Kenta Nakai. +Paul Horton, Keun-Joon Park, Takeshi Obayashi and Kenta Nakai (2006). Protein Subcellular Localization Prediction with WoLF PSORT. -Proceedings of the 4th Annual Asia Pacific Bioinformatics Conference APBC06, Taipei, Taiwan. pp. 39-48, 2006. +Proceedings of the 4th Annual Asia Pacific Bioinformatics Conference APBC06, Taipei, Taiwan. pp. 39-48. -http://wolfpsort.org +See also http://wolfpsort.org +This wrapper is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp