# HG changeset patch
# User peterjc
# Date 1366201585 14400
# Node ID f93ad488233841c3753520de4df35132637d4e33
# Parent 0ad90e5eb39026a6d0585121df5c85f1705546ee
Uploaded v0.0.6, adds unit tests and minor documentation changes.
diff -r 0ad90e5eb390 -r f93ad4882338 test-data/empty.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/empty.fasta Wed Apr 17 08:26:25 2013 -0400
@@ -0,0 +1,2 @@
+
+
diff -r 0ad90e5eb390 -r f93ad4882338 test-data/empty_nlstradamus.tabular
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/empty_nlstradamus.tabular Wed Apr 17 08:26:25 2013 -0400
@@ -0,0 +1,1 @@
+#ID algorithm score start stop sequence
diff -r 0ad90e5eb390 -r f93ad4882338 test-data/four_human_proteins.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/four_human_proteins.fasta Wed Apr 17 08:26:25 2013 -0400
@@ -0,0 +1,61 @@
+>sp|Q9BS26|ERP44_HUMAN Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1
+MHPAVFLSLPDLRCSLLLLVTWVFTPVTTEITSLDTENIDEILNNADVALVNFYADWCRF
+SQMLHPIFEEASDVIKEEFPNENQVVFARVDCDQHSDIAQRYRISKYPTLKLFRNGMMMK
+REYRGQRSVKALADYIRQQKSDPIQEIRDLAEITTLDRSKRNIIGYFEQKDSDNYRVFER
+VANILHDDCAFLSAFGDVSKPERYSGDNIIYKPPGHSAPDMVYLGAMTNFDVTYNWIQDK
+CVPLVREITFENGEELTEEGLPFLILFHMKEDTESLEIFQNEVARQLISEKGTINFLHAD
+CDKFRHPLLHIQKTPADCPVIAIDSFRHMYVFGDFKDVLIPGKLKQFVFDLHSGKLHREF
+HHGPDPTDTAPGEQAQDVASSPPESSFQKLAPSEYRYTLLRDRDEL
+>sp|Q9NSY1|BMP2K_HUMAN BMP-2-inducible protein kinase OS=Homo sapiens GN=BMP2K PE=1 SV=2
+MKKFSRMPKSEGGSGGGAAGGGAGGAGAGAGCGSGGSSVGVRVFAVGRHQVTLEESLAEG
+GFSTVFLVRTHGGIRCALKRMYVNNMPDLNVCKREITIMKELSGHKNIVGYLDCAVNSIS
+DNVWEVLILMEYCRAGQVVNQMNKKLQTGFTEPEVLQIFCDTCEAVARLHQCKTPIIHRD
+LKVENILLNDGGNYVLCDFGSATNKFLNPQKDGVNVVEEEIKKYTTLSYRAPEMINLYGG
+KPITTKADIWALGCLLYKLCFFTLPFGESQVAICDGNFTIPDNSRYSRNIHCLIRFMLEP
+DPEHRPDIFQVSYFAFKFAKKDCPVSNINNSSIPSALPEPMTASEAAARKSQIKARITDT
+IGPTETSIAPRQRPKANSATTATPSVLTIQSSATPVKVLAPGEFGNHRPKGALRPGNGPE
+ILLGQGPPQQPPQQHRVLQQLQQGDWRLQQLHLQHRHPHQQQQQQQQQQQQQQQQQQQQQ
+QQQQQQHHHHHHHHLLQDAYMQQYQHATQQQQMLQQQFLMHSVYQPQPSASQYPTMMPQY
+QQAFFQQQMLAQHQPSQQQASPEYLTSPQEFSPALVSYTSSLPAQVGTIMDSSYSANRSV
+ADKEAIANFTNQKNISNPPDMSGWNPFGEDNFSKLTEEELLDREFDLLRSNRLEERASSD
+KNVDSLSAPHNHPPEDPFGSVPFISHSGSPEKKAEHSSINQENGTANPIKNGKTSPASKD
+QRTGKKTSVQGQVQKGNDESESDFESDPPSPKSSEEEEQDDEEVLQGEQGDFNDDDTEPE
+NLGHRPLLMDSEDEEEEEKHSSDSDYEQAKAKYSDMSSVYRDRSGSGPTQDLNTILLTSA
+QLSSDVAVETPKQEFDVFGAVPFFAVRAQQPQQEKNEKNLPQHRFPAAGLEQEEFDVFTK
+APFSKKVNVQECHAVGPEAHTIPGYPKSVDVFGSTPFQPFLTSTSKSESNEDLFGLVPFD
+EITGSQQQKVKQRSLQKLSSRQRRTKQDMSKSNGKRHHGTPTSTKKTLKPTYRTPERARR
+HKKVGRRDSQSSNEFLTISDSKENISVALTDGKDRGNVLQPEESLLDPFGAKPFHSPDLS
+WHPPHQGLSDIRADHNTVLPGRPRQNSLHGSFHSADVLKMDDFGAVPFTELVVQSITPHQ
+SQQSQPVELDPFGAAPFPSKQ
+>sp|P06213|INSR_HUMAN Insulin receptor OS=Homo sapiens GN=INSR PE=1 SV=4
+MATGGRRGAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIRNNLTRLHELENCSVIEGHL
+QILLMFKTRPEDFRDLSFPKLIMITDYLLLFRVYGLESLKDLFPNLTVIRGSRLFFNYAL
+VIFEMVHLKELGLYNLMNITRGSVRIEKNNELCYLATIDWSRILDSVEDNYIVLNKDDNE
+ECGDICPGTAKGKTNCPATVINGQFVERCWTHSHCQKVCPTICKSHGCTAEGLCCHSECL
+GNCSQPDDPTKCVACRNFYLDGRCVETCPPPYYHFQDWRCVNFSFCQDLHHKCKNSRRQG
+CHQYVIHNNKCIPECPSGYTMNSSNLLCTPCLGPCPKVCHLLEGEKTIDSVTSAQELRGC
+TVINGSLIINIRGGNNLAAELEANLGLIEEISGYLKIRRSYALVSLSFFRKLRLIRGETL
+EIGNYSFYALDNQNLRQLWDWSKHNLTITQGKLFFHYNPKLCLSEIHKMEEVSGTKGRQE
+RNDIALKTNGDQASCENELLKFSYIRTSFDKILLRWEPYWPPDFRDLLGFMLFYKEAPYQ
+NVTEFDGQDACGSNSWTVVDIDPPLRSNDPKSQNHPGWLMRGLKPWTQYAIFVKTLVTFS
+DERRTYGAKSDIIYVQTDATNPSVPLDPISVSNSSSQIILKWKPPSDPNGNITHYLVFWE
+RQAEDSELFELDYCLKGLKLPSRTWSPPFESEDSQKHNQSEYEDSAGECCSCPKTDSQIL
+KELEESSFRKTFEDYLHNVVFVPRKTSSGTGAEDPRPSRKRRSLGDVGNVTVAVPTVAAF
+PNTSSTSVPTSPEEHRPFEKVVNKESLVISGLRHFTGYRIELQACNQDTPEERCSVAAYV
+SARTMPEAKADDIVGPVTHEIFENNVVHLMWQEPKEPNGLIVLYEVSYRRYGDEELHLCV
+SRKHFALERGCRLRGLSPGNYSVRIRATSLAGNGSWTEPTYFYVTDYLDVPSNIAKIIIG
+PLIFVFLFSVVIGSIYLFLRKRQPDGPLGPLYASSNPEYLSASDVFPCSVYVPDEWEVSR
+EKITLLRELGQGSFGMVYEGNARDIIKGEAETRVAVKTVNESASLRERIEFLNEASVMKG
+FTCHHVVRLLGVVSKGQPTLVVMELMAHGDLKSYLRSLRPEAENNPGRPPPTLQEMIQMA
+AEIADGMAYLNAKKFVHRDLAARNCMVAHDFTVKIGDFGMTRDIYETDYYRKGGKGLLPV
+RWMAPESLKDGVFTTSSDMWSFGVVLWEITSLAEQPYQGLSNEQVLKFVMDGGYLDQPDN
+CPERVTDLMRMCWQFNPKMRPTFLEIVNLLKDDLHPSFPEVSFFHSEENKAPESEELEME
+FEDMENVPLDRSSHCQREEAGGRDGGSSLGFKRSYEEHIPYTHMNGGKKNGRILTLPRSN
+PS
+>sp|P08100|OPSD_HUMAN Rhodopsin OS=Homo sapiens GN=RHO PE=1 SV=1
+MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLY
+VTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLG
+GEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIP
+EGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQES
+ATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAI
+YNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA
diff -r 0ad90e5eb390 -r f93ad4882338 test-data/four_human_proteins.nlstradamus.tabular
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/four_human_proteins.nlstradamus.tabular Wed Apr 17 08:26:25 2013 -0400
@@ -0,0 +1,2 @@
+#ID algorithm score start stop sequence
+sp|Q9NSY1|BMP2K_HUMAN posterior 0.945 983 1027 RRTKQDMSKSNGKRHHGTPTSTKKTLKPTYRTPERARRHKKVGRR
diff -r 0ad90e5eb390 -r f93ad4882338 tools/protein_analysis/nlstradamus.txt
--- a/tools/protein_analysis/nlstradamus.txt Tue Jun 07 17:39:58 2011 -0400
+++ b/tools/protein_analysis/nlstradamus.txt Wed Apr 17 08:26:25 2013 -0400
@@ -1,7 +1,7 @@
-Galaxy wrapper for NLStradamus v1.7 (C++ version)
-=================================================
+Galaxy wrapper for NLStradamus v1.7 or v1.8 (C++ version)
+=========================================================
-This wrapper is copyright 2011 by Peter Cock, The James Hutton Institute
+This wrapper is copyright 2011-2013 by Peter Cock, The James Hutton Institute
(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
See the licence text below.
@@ -11,15 +11,24 @@
A. N. Nguyen Ba, A. Pogoutse, N. Provart, A. M. Moses.
NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction.
BMC Bioinformatics. 2009 Jun 29;10(1):202.
+http://dx.doi.org/10.1186/1471-2105-10-202
http://www.moseslab.csb.utoronto.ca/NLStradamus
Early versions of NLStradamus did not have a native tabular output format, this
was added in version 1.7. Additionally a fast C++ implementation was added at
-this point (early versions of NLStradamus came as a perl script only). This
-wrapper expects the compiled C++ binary "NLStradamus" to be on the system PATH.
+this point (early versions of NLStradamus came as a perl script only).
+
+Version 1.8 fixed a C++ compilation issue on modern compilers, but is otherwise
+unchanged.
+
-To install the wrapper installed the following files under the Galaxy tools
+Installation
+============
+This wrapper expects the compiled C++ binary "NLStradamus" to be on the system
+PATH.
+
+To install the wrapper copy or move the following files under the Galaxy tools
folder, e.g. in a tools/protein_analysis folder:
* nlstradamus.xml (the Galaxy tool definition)
@@ -31,6 +40,9 @@
+If you wish to run the unit tests, also add this to tools_conf.xml.sample
+and move/copy the test-data files under Galaxy's test-data folder.
+
That's it.
@@ -38,6 +50,11 @@
=======
v0.0.3 - Initial public release
+v0.0.4 - Adding DOI link to reference
+ (Documentation change only)
+v0.0.5 - Assume non-zero return codes are errors
+v0.0.6 - Show output help text using a table
+ - Added unit tests
Developers
@@ -46,17 +63,20 @@
This script and related tools are being developed on the following hg branch:
http://bitbucket.org/peterjc/galaxy-central/src/tools
-For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball use
+For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
the following command from the Galaxy root folder:
-tar -czf nlstradmus.tar.gz tools/protein_analysis/nlstradum.xml tools/protein_analysis/nlstradum.txt
+$ tar -czf nlstradmus.tar.gz tools/protein_analysis/nlstradamus.xml tools/protein_analysis/nlstradamus.txt test-data/four_human_proteins.fasta test-data/four_human_proteins.nlstradamus.tabular test-data/empty.fasta test-data/empty_nlstradamus.tabular
Check this worked:
$ tar -tzf nlstradmus.tar.gz
-filter/seq_filter_by_id.py
-filter/seq_filter_by_id.txt
-filter/seq_filter_by_id.xml
+tools/protein_analysis/nlstradamus.xml
+tools/protein_analysis/nlstradamus.txt
+test-data/four_human_proteins.fasta
+test-data/four_human_proteins.nlstradamus.tabular
+test-data/empty.fasta
+test-data/empty_nlstradamus.tabular
Licence (MIT/BSD style)
diff -r 0ad90e5eb390 -r f93ad4882338 tools/protein_analysis/nlstradamus.xml
--- a/tools/protein_analysis/nlstradamus.xml Tue Jun 07 17:39:58 2011 -0400
+++ b/tools/protein_analysis/nlstradamus.xml Wed Apr 17 08:26:25 2013 -0400
@@ -1,8 +1,13 @@
-
+
Find nuclear localization signals (NLSs) in protein sequences
NLStradamus -i $fasta_file -t $threshold -m $model -a $algorithm -tab > $tabular_file
+
+
+
+
+
@@ -25,6 +30,20 @@
NLStradamus
+
+
+
+
+
+
+
+
+
+
+
+
+
+
@@ -36,12 +55,16 @@
The input is a FASTA file of protein sequences, and the output is tabular
with six columns (one row per NLS):
- * Sequence identifier
- * Algorithm (posterior or Viterbi)
- * Score (probability between threshold and 1 for posterior algorithm)
- * Start
- * End
- * Sequence of NLS
+====== ===================================================================
+Column Description
+------ -------------------------------------------------------------------
+ c1 Sequence identifier
+ c2 Algorithm (posterior or Viterbi)
+ c3 Score (probability between threshold and 1 for posterior algorithm)
+ c4 Start
+ c5 End
+ c6 Sequence of NLS
+====== ===================================================================
-----
@@ -50,6 +73,7 @@
A. N. Nguyen Ba, A. Pogoutse, N. Provart, A. M. Moses.
NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction.
BMC Bioinformatics. 2009 Jun 29;10(1):202.
+http://dx.doi.org/10.1186/1471-2105-10-202
http://www.moseslab.csb.utoronto.ca/NLStradamus