Previous changeset 2:ab1a8640f817 (2012-08-23) Next changeset 4:9d5beacae92b (2012-09-19) |
Commit message:
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler. |
added:
tools/ncbi_blast_plus/blastdb.loc.sample tools/ncbi_blast_plus/blastdb_p.loc.sample tools/ncbi_blast_plus/blastxml_to_tabular.py tools/ncbi_blast_plus/blastxml_to_tabular.xml tools/ncbi_blast_plus/hide_stderr.py tools/ncbi_blast_plus/ncbi_blast_plus.txt tools/ncbi_blast_plus/ncbi_blastn_wrapper.xml tools/ncbi_blast_plus/ncbi_blastp_wrapper.xml tools/ncbi_blast_plus/ncbi_blastx_wrapper.xml tools/ncbi_blast_plus/ncbi_tblastn_wrapper.xml tools/ncbi_blast_plus/ncbi_tblastx_wrapper.xml tools/ncbi_blast_plus/tool_dependencies.xml |
removed:
blastdb.loc.sample blastdb_p.loc.sample blastxml_to_tabular.py blastxml_to_tabular.xml hide_stderr.py ncbi_blast_plus.txt ncbi_blastn_wrapper.xml ncbi_blastp_wrapper.xml ncbi_blastx_wrapper.xml ncbi_tblastn_wrapper.xml ncbi_tblastx_wrapper.xml tool_dependencies.xml |
b |
diff -r ab1a8640f817 -r 643338ac83c0 blastdb.loc.sample --- a/blastdb.loc.sample Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,38 +0,0 @@ -#This is a sample file distributed with Galaxy that is used to define a -#list of nucleotide BLAST databases, using three columns tab separated -#(longer whitespace are TAB characters): -# -#<unique_id> <database_caption> <base_name_path> -# -#The captions typically contain spaces and might end with the build date. -#It is important that the actual database name does not have a space in it, -#and that the first tab that appears in the line is right before the path. -# -#So, for example, if your database is nt and the path to your base name -#is /depot/data2/galaxy/blastdb/nt/nt.chunk, then the blastdb.loc entry -#would look like this: -# -#nt_02_Dec_2009 nt 02 Dec 2009 /depot/data2/galaxy/blastdb/nt/nt.chunk -# -#and your /depot/data2/galaxy/blastdb/nt directory would contain all of -#your "base names" (e.g.): -# -#-rw-r--r-- 1 wychung galaxy 23437408 2008-04-09 11:26 nt.chunk.00.nhr -#-rw-r--r-- 1 wychung galaxy 3689920 2008-04-09 11:26 nt.chunk.00.nin -#-rw-r--r-- 1 wychung galaxy 251215198 2008-04-09 11:26 nt.chunk.00.nsq -#...etc... -# -#Your blastdb.loc file should include an entry per line for each "base name" -#you have stored. For example: -# -#nt_02_Dec_2009 nt 02 Dec 2009 /depot/data2/galaxy/blastdb/nt/nt.chunk -#wgs_30_Nov_2009 wgs 30 Nov 2009 /depot/data2/galaxy/blastdb/wgs/wgs.chunk -#test_20_Sep_2008 test 20 Sep 2008 /depot/data2/galaxy/blastdb/test/test -#...etc... -# -#See also blastdb_p.loc which is for any protein BLAST database. -# -#Note that for backwards compatibility with workflows, the unique ID of -#an entry must be the path that was in the original loc file, because that -#is the value stored in the workflow for that parameter. -# |
b |
diff -r ab1a8640f817 -r 643338ac83c0 blastdb_p.loc.sample --- a/blastdb_p.loc.sample Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,27 +0,0 @@ -#This is a sample file distributed with Galaxy that is used to define a -#list of protein BLAST databases, using three columns tab separated -#(longer whitespace are TAB characters): -# -#<unique_id> <database_caption> <base_name_path> -# -#The captions typically contain spaces and might end with the build date. -#It is important that the actual database name does not have a space in it, -#and that the first tab that appears in the line is right before the path. -# -#So, for example, if your database is NR and the path to your base name -#is /data/blastdb/nr, then the blastdb_p.loc entry would look like this: -# -#nr NCBI NR (non redundant) /data/blastdb/nr -# -#and your /data/blastdb directory would contain all of the files associated -#with the database, /data/blastdb/nr.*. -# -#Your blastdb_p.loc file should include an entry per line for each "base name" -#you have stored. For example: -# -#nr_05Jun2010 NCBI NR (non redundant) 05 Jun 2010 /data/blastdb/05Jun2010/nr -#nr_15Aug2010 NCBI NR (non redundant) 15 Aug 2010 /data/blastdb/15Aug2010/nr -#...etc... -# -#See also blastdb.loc which is for any nucleotide BLAST database. -# |
b |
diff -r ab1a8640f817 -r 643338ac83c0 blastxml_to_tabular.py --- a/blastxml_to_tabular.py Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
b'@@ -1,254 +0,0 @@\n-#!/usr/bin/env python\n-"""Convert a BLAST XML file to 12 column tabular output\n-\n-Takes three command line options, input BLAST XML filename, output tabular\n-BLAST filename, output format (std for standard 12 columns, or ext for the\n-extended 24 columns offered in the BLAST+ wrappers).\n-\n-The 12 columns output are \'qseqid sseqid pident length mismatch gapopen qstart\n-qend sstart send evalue bitscore\' or \'std\' at the BLAST+ command line, which\n-mean:\n- \n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The additional columns offered in the Galaxy BLAST+ wrappers are:\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a \';\'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-Most of these fields are given explicitly in the XML file, others some like\n-the percentage identity and the number of gap openings must be calculated.\n-\n-Be aware that the sequence in the extended tabular output or XML direct from\n-BLAST+ may or may not use XXXX masking on regions of low complexity. This\n-can throw the off the calculation of percentage identity and gap openings.\n-[In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard,\n-with these numbers changing depending on whether or not the low complexity\n-filter is used.]\n-\n-This script attempts to produce identical output to what BLAST+ would have done.\n-However, check this with "diff -b ..." since BLAST+ sometimes includes an extra\n-space character (probably a bug).\n-"""\n-import sys\n-import re\n-\n-if sys.version_info[:2] >= ( 2, 5 ):\n- import xml.etree.cElementTree as ElementTree\n-else:\n- from galaxy import eggs\n- import pkg_resources; pkg_resources.require( "elementtree" )\n- from elementtree import ElementTree\n-\n-def stop_err( msg ):\n- sys.stderr.write("%s\\n" % msg)\n- sys.exit(1)\n-\n-#Parse Command Line\n-try:\n- in_file, out_file, out_fmt = sys.argv[1:]\n-except:\n- stop_err("Expect 3 arguments: input BLAST XML file, output tabular file, out format (std or ext)")\n-\n-if out_fmt == "std":\n- extended = False\n-elif out_fmt == "x22":\n- stop_err("Format argument x22 has been replaced with ext (extended 24 columns)")\n-elif out_fmt == "ext":\n- extended = True\n-else:\n- stop_err("Format argument should be std (12 column) or ext (extended 24 columns)")\n-\n-\n-# get an iterable\n-try: \n- context = ElementTree.iterparse(in_file, events=("start", "end"))\n-except:\n- stop_err("Invalid data format.")\n-# turn it into an iterator\n-context = iter(context)\n-# get the root element\n-try:\n- event, root = context.next()\n-except:\n- st'..b'")\n- xx = sum(1 for q,h in zip(q_seq, h_seq) if q=="X" and h=="X")\n- if not (expected_mismatch - q_seq.count("X") <= int(mismatch) <= expected_mismatch + xx):\n- stop_err("%s vs %s mismatches, expected %i <= %i <= %i" \\\n- % (qseqid, sseqid, expected_mismatch - q_seq.count("X"),\n- int(mismatch), expected_mismatch))\n-\n- #TODO - Remove this alternative identity calculation and test\n- #once satisifed there are no problems\n- expected_identity = sum(1 for q,h in zip(q_seq, h_seq) if q == h)\n- if not (expected_identity - xx <= int(nident) <= expected_identity + q_seq.count("X")):\n- stop_err("%s vs %s identities, expected %i <= %i <= %i" \\\n- % (qseqid, sseqid, expected_identity, int(nident),\n- expected_identity + q_seq.count("X")))\n- \n-\n- evalue = hsp.findtext("Hsp_evalue")\n- if evalue == "0":\n- evalue = "0.0"\n- else:\n- evalue = "%0.0e" % float(evalue)\n- \n- bitscore = float(hsp.findtext("Hsp_bit-score"))\n- if bitscore < 100:\n- #Seems to show one decimal place for lower scores\n- bitscore = "%0.1f" % bitscore\n- else:\n- #Note BLAST does not round to nearest int, it truncates\n- bitscore = "%i" % bitscore\n-\n- values = [qseqid,\n- sseqid,\n- pident,\n- length, #hsp.findtext("Hsp_align-len")\n- str(mismatch),\n- gapopen,\n- hsp.findtext("Hsp_query-from"), #qstart,\n- hsp.findtext("Hsp_query-to"), #qend,\n- hsp.findtext("Hsp_hit-from"), #sstart,\n- hsp.findtext("Hsp_hit-to"), #send,\n- evalue, #hsp.findtext("Hsp_evalue") in scientific notation\n- bitscore, #hsp.findtext("Hsp_bit-score") rounded\n- ]\n-\n- if extended:\n- sallseqid = ";".join(name.split(None,1)[0] for name in hit_def.split(">"))\n- #print hit_def, "-->", sallseqid\n- positive = hsp.findtext("Hsp_positive")\n- ppos = "%0.2f" % (100*float(positive)/float(length))\n- qframe = hsp.findtext("Hsp_query-frame")\n- sframe = hsp.findtext("Hsp_hit-frame")\n- if blast_program == "blastp":\n- #Probably a bug in BLASTP that they use 0 or 1 depending on format\n- if qframe == "0": qframe = "1"\n- if sframe == "0": sframe = "1"\n- slen = int(hit.findtext("Hit_len"))\n- values.extend([sallseqid,\n- hsp.findtext("Hsp_score"), #score,\n- nident,\n- positive,\n- hsp.findtext("Hsp_gaps"), #gaps,\n- ppos,\n- qframe,\n- sframe,\n- #NOTE - for blastp, XML shows original seq, tabular uses XXX masking\n- q_seq,\n- h_seq,\n- str(qlen),\n- str(slen),\n- ])\n- #print "\\t".join(values) \n- outfile.write("\\t".join(values) + "\\n")\n- # prevents ElementTree from growing large datastructure\n- root.clear()\n- elem.clear()\n-outfile.close()\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 blastxml_to_tabular.xml --- a/blastxml_to_tabular.xml Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,127 +0,0 @@ -<tool id="blastxml_to_tabular" name="BLAST XML to tabular" version="0.0.8"> - <description>Convert BLAST XML output to tabular</description> - <command interpreter="python"> - blastxml_to_tabular.py $blastxml_file $tabular_file $out_format - </command> - <inputs> - <param name="blastxml_file" type="data" format="blastxml" label="BLAST results as XML"/> - <param name="out_format" type="select" label="Output format"> - <option value="std" selected="True">Tabular (standard 12 columns)</option> - <option value="ext">Tabular (extended 24 columns)</option> - </param> - </inputs> - <outputs> - <data name="tabular_file" format="tabular" label="BLAST results as tabular" /> - </outputs> - <requirements> - </requirements> - <tests> - <test> - <param name="blastxml_file" value="blastp_four_human_vs_rhodopsin.xml" ftype="blastxml" /> - <param name="out_format" value="std" /> - <!-- Note this has some white space differences from the actual blastp output blast_four_human_vs_rhodopsin.tabluar --> - <output name="tabular_file" file="blastp_four_human_vs_rhodopsin_converted.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastp_four_human_vs_rhodopsin.xml" ftype="blastxml" /> - <param name="out_format" value="ext" /> - <!-- Note this has some white space differences from the actual blastp output blast_four_human_vs_rhodopsin_22c.tabluar --> - <output name="tabular_file" file="blastp_four_human_vs_rhodopsin_converted_ext.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastp_sample.xml" ftype="blastxml" /> - <param name="out_format" value="std" /> - <!-- Note this has some white space differences from the actual blastp output --> - <output name="tabular_file" file="blastp_sample_converted.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastx_rhodopsin_vs_four_human.xml" ftype="blastxml" /> - <param name="out_format" value="std" /> - <!-- Note this has some white space differences from the actual blastx output --> - <output name="tabular_file" file="blastx_rhodopsin_vs_four_human_converted.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastx_rhodopsin_vs_four_human.xml" ftype="blastxml" /> - <param name="out_format" value="ext" /> - <!-- Note this has some white space and XXXX masking differences from the actual blastx output --> - <output name="tabular_file" file="blastx_rhodopsin_vs_four_human_converted_ext.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastx_sample.xml" ftype="blastxml" /> - <param name="out_format" value="std" /> - <!-- Note this has some white space differences from the actual blastx output --> - <output name="tabular_file" file="blastx_sample_converted.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastp_human_vs_pdb_seg_no.xml" ftype="blastxml" /> - <param name="out_format" value="std" /> - <!-- Note this has some white space differences from the actual blastp output --> - <output name="tabular_file" file="blastp_human_vs_pdb_seg_no_converted_std.tabular" ftype="tabular" /> - </test> - <test> - <param name="blastxml_file" value="blastp_human_vs_pdb_seg_no.xml" ftype="blastxml" /> - <param name="out_format" value="ext" /> - <!-- Note this has some white space differences from the actual blastp output --> - <output name="tabular_file" file="blastp_human_vs_pdb_seg_no_converted_ext.tabular" ftype="tabular" /> - </test> - </tests> - <help> - -**What it does** - -NCBI BLAST+ (and the older NCBI 'legacy' BLAST) can output in a range of -formats including tabular and a more detailed XML format. A complex workflow -may need both the XML and the tabular output - but running BLAST twice is -slow and wasteful. - -This tool takes the BLAST XML output and by default converts it into the -standard 12 column tabular equivalent: - -====== ========= ============================================ -Column NCBI name Description ------- --------- -------------------------------------------- - 1 qseqid Query Seq-id (ID of your sequence) - 2 sseqid Subject Seq-id (ID of the database hit) - 3 pident Percentage of identical matches - 4 length Alignment length - 5 mismatch Number of mismatches - 6 gapopen Number of gap openings - 7 qstart Start of alignment in query - 8 qend End of alignment in query - 9 sstart Start of alignment in subject (database hit) - 10 send End of alignment in subject (database hit) - 11 evalue Expectation value (E-value) - 12 bitscore Bit score -====== ========= ============================================ - -The BLAST+ tools can optionally output additional columns of information, -but this takes longer to calculate. Most (but not all) of these columns are -included by selecting the extended tabular output. The extra columns are -included *after* the standard 12 columns. This is so that you can write -workflow filtering steps that accept either the 12 or 22 column tabular -BLAST output. - -====== ============= =========================================== -Column NCBI name Description ------- ------------- ------------------------------------------- - 13 sallseqid All subject Seq-id(s), separated by a ';' - 14 score Raw score - 15 nident Number of identical matches - 16 positive Number of positive-scoring matches - 17 gaps Total number of gaps - 18 ppos Percentage of positive-scoring matches - 19 qframe Query frame - 20 sframe Subject frame - 21 qseq Aligned part of query sequence - 22 sseq Aligned part of subject sequence - 23 qlen Query sequence length - 24 slen Subject sequence length -====== ============= =========================================== - -Beware that the XML file (and thus the conversion) and the tabular output -direct from BLAST+ may differ in the presence of XXXX masking on regions -low complexity (columns 21 and 22), and thus also calculated figures like -the percentage idenity (column 3). - - </help> -</tool> |
b |
diff -r ab1a8640f817 -r 643338ac83c0 hide_stderr.py --- a/hide_stderr.py Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
@@ -1,49 +0,0 @@ -#!/usr/bin/env python -"""A simple script to redirect stderr to stdout when the return code is zero. - -See https://bitbucket.org/galaxy/galaxy-central/issue/325/ - -Currently Galaxy ignores the return code from command line tools (even if it -is non-zero which by convention indicates an error) and treats any output on -stderr as an error (even though by convention stderr is used for errors or -warnings). - -This script runs the given command line, capturing all stdout and stderr in -memory, and gets the return code. For a zero return code, any stderr (which -should be warnings only) is added to the stdout. That way Galaxy believes -everything is fine. For a non-zero return code, we output stdout as is, and -any stderr, plus the return code to ensure there is some output on stderr. -That way Galaxy treats this as an error. - -Once issue 325 is fixed, this script will not be needed. -""" -import sys -import subprocess - -#Avoid using shell=True when we call subprocess to ensure if the Python -#script is killed, so too is the BLAST process. -try: - words = [] - for w in sys.argv[1:]: - if " " in w: - words.append('"%s"' % w) - else: - words.append(w) - cmd = " ".join(words) - child = subprocess.Popen(sys.argv[1:], - stdout=subprocess.PIPE, stderr=subprocess.PIPE) -except Exception, err: - sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err)) - sys.exit(1) -#Use .communicate as can get deadlocks with .wait(), -stdout, stderr = child.communicate() -return_code = child.returncode - -if return_code: - sys.stdout.write(stdout) - sys.stderr.write(stderr) - sys.stderr.write("Return error code %i from command:\n" % return_code) - sys.stderr.write("%s\n" % cmd) -else: - sys.stdout.write(stdout) - sys.stdout.write(stderr) |
b |
diff -r ab1a8640f817 -r 643338ac83c0 ncbi_blast_plus.txt --- a/ncbi_blast_plus.txt Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,81 +0,0 @@ -Galaxy wrappers for NCBI BLAST+ suite -===================================== - -These wrappers are copyright 2010-2012 by Peter Cock, The James Hutton Institute -(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. -See the licence text below. - -Currently tested with NCBI BLAST 2.2.26+ (i.e. version 2.2.26 of BLAST+), -and do not work with the NCBI 'legacy' BLAST suite (e.g. blastall). - -Note that these wrappers were originally distributed as part of the main -Galaxy repository, but as of August 2012 moved to the Galaxy Tool Shed. -My thanks to Dannon Baker from the Galaxy development team for this assistance -with this. - - -Manual Installation -=================== - -For those not using Galaxy's automated installation from the Tool Shed, put -the XML and Python files under tools/ncbi_blast_plus and add the XML files -to your tool_conf.xml as normal. - -You must tell Galaxy about any system level BLAST databases using configuration -files blastdb.loc (nucleotide databases like NT) and blastdb_p.loc (protein -databases like NR). - -You will also need to install the 'blast_datatypes' from the Tool Shed. This -defines the BLAST XML file format ('blastxml'). - - -History -======= - -v0.0.11 - Final revision as part of the Galaxy main repository, and the - first release via the Tool Shed -v0.0.12 - Implements genetic code option for translation searches. - - Changes <parallelism> to 1000 sequences at a time (to cope with - very large sets of queries where BLAST+ can become memory hungry) - - Include warning that BLAST+ with subject FASTA gives pairwise - e-values - - -Developers -========== - -This script and related tools are being developed on the following hg branch: -http://bitbucket.org/peterjc/galaxy-central/src/tools - -For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball I use -the following command from the Galaxy tools/ncbi_blast_plus folder: - -$ ./make_ncbi_blast_plus.sh - -This similifies ensuring a consistent set of files is bundled each time, -including all the relevant test files. - - -Licence (MIT/BSD style) -======================= - -Permission to use, copy, modify, and distribute this software and its -documentation with or without modifications and for any purpose and -without fee is hereby granted, provided that any copyright notices -appear in all copies and that both those copyright notices and this -permission notice appear in supporting documentation, and that the -names of the contributors or copyright holders not be used in -advertising or publicity pertaining to distribution of the software -without specific prior permission. - -THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL -WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED -WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE -CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT -OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS -OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE -OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE -OR PERFORMANCE OF THIS SOFTWARE. - -NOTE: This is the licence for the Galaxy Wrapper only. BLAST+ and -associated data files are available and licenced separately. |
b |
diff -r ab1a8640f817 -r 643338ac83c0 ncbi_blastn_wrapper.xml --- a/ncbi_blastn_wrapper.xml Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,211 +0,0 @@\n-<tool id="ncbi_blastn_wrapper" name="NCBI BLAST+ blastn" version="0.0.12">\n- <description>Search nucleotide database with nucleotide query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n- <version_command>blastn -version</version_command>\n- <command interpreter="python">hide_stderr.py\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-blastn\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#else:\n- -subject "$db_opts.subject"\n-#end if\n--task $blast_type\n--evalue $evalue_cutoff\n--out $output1\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n-$adv_opts.filter_query\n-$adv_opts.strand\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-$adv_opts.ungapped\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Subject database/sequences">\n- <option value="db" selected="True">BLAST Database</option>\n- <option value="file">FASTA file (pairwise e-values)</option>\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Nucleotide BLAST database">\n- <options from_file="blastdb.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="subject" type="hidden" value="" /> \n- </when>\n- <when value="file">\n- <param name="database" type="hidden" value="" /> \n- <param name="subject" type="data" format="fasta" label="Nucleotide FASTA file to use as database"/> \n- </when>\n- </conditional>\n- <param name="blast_type" type="select" display="radio" label="Type of BLAST">\n- <option value="megablast">megablast</option>\n- <option value="blastn">blastn</option>\n- <option value="blastn-short">blastn-short</option>\n- <option value="dc-megablast">dc-megablast</option>\n- <!-- Using BLAST 2.2.24+ this gives an error:\n- BLAST engine error: Program type \'vecscreen\' not supported\n- <option value="vecscreen">vecscreen</option>\n- -->\n- </param>\n- <param name="evalue_cutoff" type="float" size="15" value="0.001" label="Set expectation value cutoff" />\n- <param name="out_format" type="select" label="Output format">\n- <option value="6" selected="True">Tabular (standard 12 columns)</option>\n- <option value="ext">Tabular (extended 24 columns)</option>\n- <option value="5">BLAST XML</option>\n- <option value="0">Pairwise text</option>\n- <option value="0 -html">Pairwise HTML</option>\n- <option value'..b'>\n- <when input="out_format" value="0 -html" format="html"/>\n- <when input="out_format" value="2" format="txt"/>\n- <when input="out_format" value="2 -html" format="html"/>\n- <when input="out_format" value="4" format="txt"/>\n- <when input="out_format" value="4 -html" format="html"/>\n- <when input="out_format" value="5" format="blastxml"/>\n- </change_format>\n- </data>\n- </outputs>\n- <requirements>\n- <requirement type="binary">blastn</requirement>\n- </requirements>\n- <help>\n- \n-.. class:: warningmark\n-\n-**Note**. Database searches may take a substantial amount of time.\n-For large input datasets it is advisable to allow overnight processing. \n-\n------\n-\n-**What it does**\n-\n-Search a *nucleotide database* using a *nucleotide query*,\n-using the NCBI BLAST+ blastn command line tool.\n-Algorithms include blastn, megablast, and discontiguous megablast.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a \';\'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Zhang et al. A Greedy Algorithm for Aligning DNA Sequences. 2000. JCB: 203-214.\n-\n- </help>\n-</tool>\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 ncbi_blastp_wrapper.xml --- a/ncbi_blastp_wrapper.xml Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,278 +0,0 @@\n-<tool id="ncbi_blastp_wrapper" name="NCBI BLAST+ blastp" version="0.0.12">\n- <description>Search protein database with protein query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n- <version_command>blastp -version</version_command>\n- <command interpreter="python">hide_stderr.py\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-blastp\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#else:\n- -subject "$db_opts.subject"\n-#end if\n--task $blast_type\n--evalue $evalue_cutoff\n--out $output1\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n-$adv_opts.filter_query\n--matrix $adv_opts.matrix\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-##Ungapped disabled for now - see comments below\n-##$adv_opts.ungapped\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Protein query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Subject database/sequences">\n- <option value="db" selected="True">BLAST Database</option>\n- <option value="file">FASTA file (pairwise e-values)</option>\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Protein BLAST database">\n- <options from_file="blastdb_p.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="subject" type="hidden" value="" /> \n- </when>\n- <when value="file">\n- <param name="database" type="hidden" value="" /> \n- <param name="subject" type="data" format="fasta" label="Protein FASTA file to use as database"/> \n- </when>\n- </conditional>\n- <param name="blast_type" type="select" display="radio" label="Type of BLAST">\n- <option value="blastp">blastp</option>\n- <option value="blastp-short">blastp-short</option>\n- </param>\n- <param name="evalue_cutoff" type="float" size="15" value="0.001" label="Set expectation value cutoff" />\n- <param name="out_format" type="select" label="Output format">\n- <option value="6" selected="True">Tabular (standard 12 columns)</option>\n- <option value="ext">Tabular (extended 24 columns)</option>\n- <option value="5">BLAST XML</option>\n- <option value="0">Pairwise text</option>\n- <option value="0 -html">Pairwise HTML</option>\n- <option value="2">Query-anchored text</option>\n- <option value="2 -html">Query-anchored HTML</option>\n- <option value="4">Flat query-anchored text</option>\n- <option value="4 -html">Flat query-anchored HTML</option>\n- <!--\n- <option value='..b'.fasta" ftype="fasta" />\n- <param name="database" value="" />\n- <param name="evalue_cutoff" value="1e-8" />\n- <param name="blast_type" value="blastp" />\n- <param name="out_format" value="6" />\n- <param name="adv_opts_selector" value="basic" />\n- <output name="output1" file="blastp_rhodopsin_vs_four_human.tabular" ftype="tabular" />\n- </test>\n- </tests>\n- <help>\n- \n-.. class:: warningmark\n-\n-**Note**. Database searches may take a substantial amount of time.\n-For large input datasets it is advisable to allow overnight processing. \n-\n------\n-\n-**What it does**\n-\n-Search a *protein database* using a *protein query*,\n-using the NCBI BLAST+ blastp command line tool.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a \';\'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n-\n-Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005.\n-\n- </help>\n-</tool>\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 ncbi_blastx_wrapper.xml --- a/ncbi_blastx_wrapper.xml Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,264 +0,0 @@\n-<tool id="ncbi_blastx_wrapper" name="NCBI BLAST+ blastx" version="0.0.12">\n- <description>Search protein database with translated nucleotide query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n- <version_command>blastx -version</version_command>\n- <command interpreter="python">hide_stderr.py\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-blastx\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#else:\n- -subject "$db_opts.subject"\n-#end if\n--query_gencode $query_gencode\n--evalue $evalue_cutoff\n--out $output1\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n-$adv_opts.filter_query\n-$adv_opts.strand\n--matrix $adv_opts.matrix\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-$adv_opts.ungapped\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Subject database/sequences">\n- <option value="db" selected="True">BLAST Database</option>\n- <option value="file">FASTA file (pairwise e-values)</option>\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Protein BLAST database">\n- <options from_file="blastdb_p.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="subject" type="hidden" value="" /> \n- </when>\n- <when value="file">\n- <param name="database" type="hidden" value="" /> \n- <param name="subject" type="data" format="fasta" label="Protein FASTA file to use as database"/> \n- </when>\n- </conditional>\n- <param name="query_gencode" type="select" label="Query genetic code">\n- <!-- See http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for details -->\n- <option value="1" select="True">1. Standard</option>\n- <option value="2">2. Vertebrate Mitochondrial</option>\n- <option value="3">3. Yeast Mitochondrial</option>\n- <option value="4">4. Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code</option>\n- <option value="5">5. Invertebrate Mitochondrial</option>\n- <option value="6">6. Ciliate, Dasycladacean and Hexamita Nuclear Code</option>\n- <option value="9">9. Echinoderm Mitochondrial</option>\n- <option value="10">10. Euplotid Nuclear</option>\n- <option value="11">11. Bacteria and Archaea</option>\n- <option value="12">12. Alternative Yeast Nuclear</option> \n- <option value="13">13. Ascidian Mitochondrial</option>\n- <option value="14">'..b' <test>\n- <param name="query" value="rhodopsin_nucs.fasta" ftype="fasta" />\n- <param name="db_opts_selector" value="file" />\n- <param name="subject" value="four_human_proteins.fasta" ftype="fasta" />\n- <param name="database" value="" />\n- <param name="evalue_cutoff" value="1e-10" />\n- <param name="out_format" value="ext" />\n- <param name="adv_opts_selector" value="basic" />\n- <output name="output1" file="blastx_rhodopsin_vs_four_human_ext.tabular" ftype="tabular" />\n- </test>\n- </tests>\n- <help>\n- \n-.. class:: warningmark\n-\n-**Note**. Database searches may take a substantial amount of time.\n-For large input datasets it is advisable to allow overnight processing. \n-\n------\n-\n-**What it does**\n-\n-Search a *protein database* using a *translated nucleotide query*,\n-using the NCBI BLAST+ blastx command line tool.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a \';\'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length \n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n-\n- </help>\n-</tool>\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 ncbi_tblastn_wrapper.xml --- a/ncbi_tblastn_wrapper.xml Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,310 +0,0 @@\n-<tool id="ncbi_tblastn_wrapper" name="NCBI BLAST+ tblastn" version="0.0.12">\n- <description>Search translated nucleotide database with protein query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n- <version_command>tblastn -version</version_command>\n- <command interpreter="python">hide_stderr.py\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-tblastn\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#else:\n- -subject "$db_opts.subject"\n-#end if\n--evalue $evalue_cutoff\n--out $output1\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n--db_gencode $adv_opts.db_gencode\n-$adv_opts.filter_query\n--matrix $adv_opts.matrix\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-##Ungapped disabled for now - see comments below\n-##$adv_opts.ungapped\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Protein query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Subject database/sequences">\n- <option value="db" selected="True">BLAST Database</option>\n- <option value="file">FASTA file (pairwise e-values)</option>\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Nucleotide BLAST database">\n- <options from_file="blastdb.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="subject" type="hidden" value="" /> \n- </when>\n- <when value="file">\n- <param name="database" type="hidden" value="" /> \n- <param name="subject" type="data" format="fasta" label="Nucleotide FASTA file to use as database"/> \n- </when>\n- </conditional>\n- <param name="evalue_cutoff" type="float" size="15" value="0.001" label="Set expectation value cutoff" />\n- <param name="out_format" type="select" label="Output format">\n- <option value="6" selected="True">Tabular (standard 12 columns)</option>\n- <option value="ext">Tabular (extended 24 columns)</option>\n- <option value="5">BLAST XML</option>\n- <option value="0">Pairwise text</option>\n- <option value="0 -html">Pairwise HTML</option>\n- <option value="2">Query-anchored text</option>\n- <option value="2 -html">Query-anchored HTML</option>\n- <option value="4">Flat query-anchored text</option>\n- <option value="4 -html">Flat query-anchored HTML</option>\n- <!--\n- <option value="-outfmt 11">BLAST archive format (ASN.1)</option>\n- -->\n- </param>\n- <conditional name="adv_opts">\n- <param name="adv_opts_selector" type="select" '..b'ase" value="" />\n- <param name="evalue_cutoff" value="1e-10" />\n- <param name="out_format" value="0 -html" />\n- <param name="adv_opts_selector" value="advanced" />\n- <param name="filter_query" value="false" />\n- <param name="matrix" value="BLOSUM80" />\n- <param name="max_hits" value="0" />\n- <param name="word_size" value="0" />\n- <param name="parse_deflines" value="false" />\n- <output name="output1" file="tblastn_four_human_vs_rhodopsin.html" ftype="html" />\n- </test>\n- </tests>\n- <help>\n- \n-.. class:: warningmark\n-\n-**Note**. Database searches may take a substantial amount of time.\n-For large input datasets it is advisable to allow overnight processing. \n-\n------\n-\n-**What it does**\n-\n-Search a *translated nucleotide database* using a *protein query*,\n-using the NCBI BLAST+ tblastn command line tool.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a \';\'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n-\n- </help>\n-</tool>\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 ncbi_tblastx_wrapper.xml --- a/ncbi_tblastx_wrapper.xml Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,252 +0,0 @@\n-<tool id="ncbi_tblastx_wrapper" name="NCBI BLAST+ tblastx" version="0.0.12">\n- <description>Search translated nucleotide database with translated nucleotide query sequence(s)</description>\n- <!-- If job splitting is enabled, break up the query file into parts -->\n- <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n- <version_command>tblastx -version</version_command>\n- <command interpreter="python">hide_stderr.py\n-## The command is a Cheetah template which allows some Python based syntax.\n-## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n-tblastx\n--query "$query"\n-#if $db_opts.db_opts_selector == "db":\n- -db "${db_opts.database.fields.path}"\n-#else:\n- -subject "$db_opts.subject"\n-#end if\n--query_gencode $query_gencode\n--evalue $evalue_cutoff\n--out $output1\n-##Set the extended list here so if/when we add things, saved workflows are not affected\n-#if str($out_format)=="ext":\n- -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n-#else:\n- -outfmt $out_format\n-#end if\n--num_threads 8\n-#if $adv_opts.adv_opts_selector=="advanced":\n--db_gencode $adv_opts.db_gencode\n-$adv_opts.filter_query\n-$adv_opts.strand\n--matrix $adv_opts.matrix\n-## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n-## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n-#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n--max_target_seqs $adv_opts.max_hits\n-#end if\n-#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n--word_size $adv_opts.word_size\n-#end if\n-$adv_opts.parse_deflines\n-## End of advanced options:\n-#end if\n- </command>\n- <inputs>\n- <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n- <conditional name="db_opts">\n- <param name="db_opts_selector" type="select" label="Subject database/sequences">\n- <option value="db" selected="True">BLAST Database</option>\n- <option value="file">FASTA file (pairwise e-values)</option>\n- </param>\n- <when value="db">\n- <param name="database" type="select" label="Nucleotide BLAST database">\n- <options from_file="blastdb.loc">\n- <column name="value" index="0"/>\n- <column name="name" index="1"/>\n- <column name="path" index="2"/>\n- </options>\n- </param>\n- <param name="subject" type="hidden" value="" /> \n- </when>\n- <when value="file">\n- <param name="database" type="hidden" value="" /> \n- <param name="subject" type="data" format="fasta" label="Nucleotide FASTA file to use as database"/> \n- </when>\n- </conditional>\n- <param name="query_gencode" type="select" label="Query genetic code">\n- <!-- See http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for details -->\n- <option value="1" select="True">1. Standard</option>\n- <option value="2">2. Vertebrate Mitochondrial</option>\n- <option value="3">3. Yeast Mitochondrial</option>\n- <option value="4">4. Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code</option>\n- <option value="5">5. Invertebrate Mitochondrial</option>\n- <option value="6">6. Ciliate, Dasycladacean and Hexamita Nuclear Code</option>\n- <option value="9">9. Echinoderm Mitochondrial</option>\n- <option value="10">10. Euplotid Nuclear</option>\n- <option value="11">11. Bacteria and Archaea</option>\n- <option value="12">12. Alternative Yeast Nuclear</option>\n- <option value="13">13. Ascidian Mitochondrial</option>\n- '..b'/>\n- <when input="out_format" value="0 -html" format="html"/>\n- <when input="out_format" value="2" format="txt"/>\n- <when input="out_format" value="2 -html" format="html"/>\n- <when input="out_format" value="4" format="txt"/>\n- <when input="out_format" value="4 -html" format="html"/>\n- <when input="out_format" value="5" format="blastxml"/>\n- </change_format>\n- </data>\n- </outputs>\n- <requirements>\n- <requirement type="binary">tblastx</requirement>\n- </requirements>\n- <help>\n- \n-.. class:: warningmark\n-\n-**Note**. Database searches may take a substantial amount of time.\n-For large input datasets it is advisable to allow overnight processing. \n-\n------\n-\n-**What it does**\n-\n-Search a *translated nucleotide database* using a *protein query*,\n-using the NCBI BLAST+ tblastx command line tool.\n-\n------\n-\n-**Output format**\n-\n-Because Galaxy focuses on processing tabular data, the default output of this\n-tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n-\n-====== ========= ============================================\n-Column NCBI name Description\n------- --------- --------------------------------------------\n- 1 qseqid Query Seq-id (ID of your sequence)\n- 2 sseqid Subject Seq-id (ID of the database hit)\n- 3 pident Percentage of identical matches\n- 4 length Alignment length\n- 5 mismatch Number of mismatches\n- 6 gapopen Number of gap openings\n- 7 qstart Start of alignment in query\n- 8 qend End of alignment in query\n- 9 sstart Start of alignment in subject (database hit)\n- 10 send End of alignment in subject (database hit)\n- 11 evalue Expectation value (E-value)\n- 12 bitscore Bit score\n-====== ========= ============================================\n-\n-The BLAST+ tools can optionally output additional columns of information,\n-but this takes longer to calculate. Most (but not all) of these columns are\n-included by selecting the extended tabular output. The extra columns are\n-included *after* the standard 12 columns. This is so that you can write\n-workflow filtering steps that accept either the 12 or 24 column tabular\n-BLAST output.\n-\n-====== ============= ===========================================\n-Column NCBI name Description\n------- ------------- -------------------------------------------\n- 13 sallseqid All subject Seq-id(s), separated by a \';\'\n- 14 score Raw score\n- 15 nident Number of identical matches\n- 16 positive Number of positive-scoring matches\n- 17 gaps Total number of gaps\n- 18 ppos Percentage of positive-scoring matches\n- 19 qframe Query frame\n- 20 sframe Subject frame\n- 21 qseq Aligned part of query sequence\n- 22 sseq Aligned part of subject sequence\n- 23 qlen Query sequence length\n- 24 slen Subject sequence length\n-====== ============= ===========================================\n-\n-The third option is BLAST XML output, which is designed to be parsed by\n-another program, and is understood by some Galaxy tools.\n-\n-You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n-The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n-The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n-The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n-and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n-\n--------\n-\n-**References**\n-\n-Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n-\n- </help>\n-</tool>\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tool_dependencies.xml --- a/tool_dependencies.xml Thu Aug 23 07:32:06 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,21 +0,0 @@ -<?xml version="1.0"?> -<tool_dependency> - <package name="blast+" version="2.2.26+"> - <install version="1.0"> - <actions> - <action type="download_by_url">ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.26/ncbi-blast-2.2.26+-src.tar.gz</action> - <action type="shell_command">cd c++ && ./configure --prefix=$INSTALL_DIR && make && make install</action> - <action type="set_environment"> - <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable> - </action> - </actions> - </install> - <readme> -These links provide information for building the NCBI Blast+ package in most environments. - -System requirements -http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download - </readme> - </package> -</tool_dependency> - |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/blastdb.loc.sample --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/blastdb.loc.sample Thu Aug 23 09:00:40 2012 -0400 |
b |
@@ -0,0 +1,38 @@ +#This is a sample file distributed with Galaxy that is used to define a +#list of nucleotide BLAST databases, using three columns tab separated +#(longer whitespace are TAB characters): +# +#<unique_id> <database_caption> <base_name_path> +# +#The captions typically contain spaces and might end with the build date. +#It is important that the actual database name does not have a space in it, +#and that the first tab that appears in the line is right before the path. +# +#So, for example, if your database is nt and the path to your base name +#is /depot/data2/galaxy/blastdb/nt/nt.chunk, then the blastdb.loc entry +#would look like this: +# +#nt_02_Dec_2009 nt 02 Dec 2009 /depot/data2/galaxy/blastdb/nt/nt.chunk +# +#and your /depot/data2/galaxy/blastdb/nt directory would contain all of +#your "base names" (e.g.): +# +#-rw-r--r-- 1 wychung galaxy 23437408 2008-04-09 11:26 nt.chunk.00.nhr +#-rw-r--r-- 1 wychung galaxy 3689920 2008-04-09 11:26 nt.chunk.00.nin +#-rw-r--r-- 1 wychung galaxy 251215198 2008-04-09 11:26 nt.chunk.00.nsq +#...etc... +# +#Your blastdb.loc file should include an entry per line for each "base name" +#you have stored. For example: +# +#nt_02_Dec_2009 nt 02 Dec 2009 /depot/data2/galaxy/blastdb/nt/nt.chunk +#wgs_30_Nov_2009 wgs 30 Nov 2009 /depot/data2/galaxy/blastdb/wgs/wgs.chunk +#test_20_Sep_2008 test 20 Sep 2008 /depot/data2/galaxy/blastdb/test/test +#...etc... +# +#See also blastdb_p.loc which is for any protein BLAST database. +# +#Note that for backwards compatibility with workflows, the unique ID of +#an entry must be the path that was in the original loc file, because that +#is the value stored in the workflow for that parameter. +# |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/blastdb_p.loc.sample --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/blastdb_p.loc.sample Thu Aug 23 09:00:40 2012 -0400 |
b |
@@ -0,0 +1,27 @@ +#This is a sample file distributed with Galaxy that is used to define a +#list of protein BLAST databases, using three columns tab separated +#(longer whitespace are TAB characters): +# +#<unique_id> <database_caption> <base_name_path> +# +#The captions typically contain spaces and might end with the build date. +#It is important that the actual database name does not have a space in it, +#and that the first tab that appears in the line is right before the path. +# +#So, for example, if your database is NR and the path to your base name +#is /data/blastdb/nr, then the blastdb_p.loc entry would look like this: +# +#nr NCBI NR (non redundant) /data/blastdb/nr +# +#and your /data/blastdb directory would contain all of the files associated +#with the database, /data/blastdb/nr.*. +# +#Your blastdb_p.loc file should include an entry per line for each "base name" +#you have stored. For example: +# +#nr_05Jun2010 NCBI NR (non redundant) 05 Jun 2010 /data/blastdb/05Jun2010/nr +#nr_15Aug2010 NCBI NR (non redundant) 15 Aug 2010 /data/blastdb/15Aug2010/nr +#...etc... +# +#See also blastdb.loc which is for any nucleotide BLAST database. +# |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/blastxml_to_tabular.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/blastxml_to_tabular.py Thu Aug 23 09:00:40 2012 -0400 |
[ |
b'@@ -0,0 +1,254 @@\n+#!/usr/bin/env python\n+"""Convert a BLAST XML file to 12 column tabular output\n+\n+Takes three command line options, input BLAST XML filename, output tabular\n+BLAST filename, output format (std for standard 12 columns, or ext for the\n+extended 24 columns offered in the BLAST+ wrappers).\n+\n+The 12 columns output are \'qseqid sseqid pident length mismatch gapopen qstart\n+qend sstart send evalue bitscore\' or \'std\' at the BLAST+ command line, which\n+mean:\n+ \n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The additional columns offered in the Galaxy BLAST+ wrappers are:\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a \';\'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+Most of these fields are given explicitly in the XML file, others some like\n+the percentage identity and the number of gap openings must be calculated.\n+\n+Be aware that the sequence in the extended tabular output or XML direct from\n+BLAST+ may or may not use XXXX masking on regions of low complexity. This\n+can throw the off the calculation of percentage identity and gap openings.\n+[In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard,\n+with these numbers changing depending on whether or not the low complexity\n+filter is used.]\n+\n+This script attempts to produce identical output to what BLAST+ would have done.\n+However, check this with "diff -b ..." since BLAST+ sometimes includes an extra\n+space character (probably a bug).\n+"""\n+import sys\n+import re\n+\n+if sys.version_info[:2] >= ( 2, 5 ):\n+ import xml.etree.cElementTree as ElementTree\n+else:\n+ from galaxy import eggs\n+ import pkg_resources; pkg_resources.require( "elementtree" )\n+ from elementtree import ElementTree\n+\n+def stop_err( msg ):\n+ sys.stderr.write("%s\\n" % msg)\n+ sys.exit(1)\n+\n+#Parse Command Line\n+try:\n+ in_file, out_file, out_fmt = sys.argv[1:]\n+except:\n+ stop_err("Expect 3 arguments: input BLAST XML file, output tabular file, out format (std or ext)")\n+\n+if out_fmt == "std":\n+ extended = False\n+elif out_fmt == "x22":\n+ stop_err("Format argument x22 has been replaced with ext (extended 24 columns)")\n+elif out_fmt == "ext":\n+ extended = True\n+else:\n+ stop_err("Format argument should be std (12 column) or ext (extended 24 columns)")\n+\n+\n+# get an iterable\n+try: \n+ context = ElementTree.iterparse(in_file, events=("start", "end"))\n+except:\n+ stop_err("Invalid data format.")\n+# turn it into an iterator\n+context = iter(context)\n+# get the root element\n+try:\n+ event, root = context.next()\n+except:\n+ st'..b'")\n+ xx = sum(1 for q,h in zip(q_seq, h_seq) if q=="X" and h=="X")\n+ if not (expected_mismatch - q_seq.count("X") <= int(mismatch) <= expected_mismatch + xx):\n+ stop_err("%s vs %s mismatches, expected %i <= %i <= %i" \\\n+ % (qseqid, sseqid, expected_mismatch - q_seq.count("X"),\n+ int(mismatch), expected_mismatch))\n+\n+ #TODO - Remove this alternative identity calculation and test\n+ #once satisifed there are no problems\n+ expected_identity = sum(1 for q,h in zip(q_seq, h_seq) if q == h)\n+ if not (expected_identity - xx <= int(nident) <= expected_identity + q_seq.count("X")):\n+ stop_err("%s vs %s identities, expected %i <= %i <= %i" \\\n+ % (qseqid, sseqid, expected_identity, int(nident),\n+ expected_identity + q_seq.count("X")))\n+ \n+\n+ evalue = hsp.findtext("Hsp_evalue")\n+ if evalue == "0":\n+ evalue = "0.0"\n+ else:\n+ evalue = "%0.0e" % float(evalue)\n+ \n+ bitscore = float(hsp.findtext("Hsp_bit-score"))\n+ if bitscore < 100:\n+ #Seems to show one decimal place for lower scores\n+ bitscore = "%0.1f" % bitscore\n+ else:\n+ #Note BLAST does not round to nearest int, it truncates\n+ bitscore = "%i" % bitscore\n+\n+ values = [qseqid,\n+ sseqid,\n+ pident,\n+ length, #hsp.findtext("Hsp_align-len")\n+ str(mismatch),\n+ gapopen,\n+ hsp.findtext("Hsp_query-from"), #qstart,\n+ hsp.findtext("Hsp_query-to"), #qend,\n+ hsp.findtext("Hsp_hit-from"), #sstart,\n+ hsp.findtext("Hsp_hit-to"), #send,\n+ evalue, #hsp.findtext("Hsp_evalue") in scientific notation\n+ bitscore, #hsp.findtext("Hsp_bit-score") rounded\n+ ]\n+\n+ if extended:\n+ sallseqid = ";".join(name.split(None,1)[0] for name in hit_def.split(">"))\n+ #print hit_def, "-->", sallseqid\n+ positive = hsp.findtext("Hsp_positive")\n+ ppos = "%0.2f" % (100*float(positive)/float(length))\n+ qframe = hsp.findtext("Hsp_query-frame")\n+ sframe = hsp.findtext("Hsp_hit-frame")\n+ if blast_program == "blastp":\n+ #Probably a bug in BLASTP that they use 0 or 1 depending on format\n+ if qframe == "0": qframe = "1"\n+ if sframe == "0": sframe = "1"\n+ slen = int(hit.findtext("Hit_len"))\n+ values.extend([sallseqid,\n+ hsp.findtext("Hsp_score"), #score,\n+ nident,\n+ positive,\n+ hsp.findtext("Hsp_gaps"), #gaps,\n+ ppos,\n+ qframe,\n+ sframe,\n+ #NOTE - for blastp, XML shows original seq, tabular uses XXX masking\n+ q_seq,\n+ h_seq,\n+ str(qlen),\n+ str(slen),\n+ ])\n+ #print "\\t".join(values) \n+ outfile.write("\\t".join(values) + "\\n")\n+ # prevents ElementTree from growing large datastructure\n+ root.clear()\n+ elem.clear()\n+outfile.close()\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/blastxml_to_tabular.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/blastxml_to_tabular.xml Thu Aug 23 09:00:40 2012 -0400 |
b |
@@ -0,0 +1,127 @@ +<tool id="blastxml_to_tabular" name="BLAST XML to tabular" version="0.0.8"> + <description>Convert BLAST XML output to tabular</description> + <command interpreter="python"> + blastxml_to_tabular.py $blastxml_file $tabular_file $out_format + </command> + <inputs> + <param name="blastxml_file" type="data" format="blastxml" label="BLAST results as XML"/> + <param name="out_format" type="select" label="Output format"> + <option value="std" selected="True">Tabular (standard 12 columns)</option> + <option value="ext">Tabular (extended 24 columns)</option> + </param> + </inputs> + <outputs> + <data name="tabular_file" format="tabular" label="BLAST results as tabular" /> + </outputs> + <requirements> + </requirements> + <tests> + <test> + <param name="blastxml_file" value="blastp_four_human_vs_rhodopsin.xml" ftype="blastxml" /> + <param name="out_format" value="std" /> + <!-- Note this has some white space differences from the actual blastp output blast_four_human_vs_rhodopsin.tabluar --> + <output name="tabular_file" file="blastp_four_human_vs_rhodopsin_converted.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastp_four_human_vs_rhodopsin.xml" ftype="blastxml" /> + <param name="out_format" value="ext" /> + <!-- Note this has some white space differences from the actual blastp output blast_four_human_vs_rhodopsin_22c.tabluar --> + <output name="tabular_file" file="blastp_four_human_vs_rhodopsin_converted_ext.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastp_sample.xml" ftype="blastxml" /> + <param name="out_format" value="std" /> + <!-- Note this has some white space differences from the actual blastp output --> + <output name="tabular_file" file="blastp_sample_converted.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastx_rhodopsin_vs_four_human.xml" ftype="blastxml" /> + <param name="out_format" value="std" /> + <!-- Note this has some white space differences from the actual blastx output --> + <output name="tabular_file" file="blastx_rhodopsin_vs_four_human_converted.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastx_rhodopsin_vs_four_human.xml" ftype="blastxml" /> + <param name="out_format" value="ext" /> + <!-- Note this has some white space and XXXX masking differences from the actual blastx output --> + <output name="tabular_file" file="blastx_rhodopsin_vs_four_human_converted_ext.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastx_sample.xml" ftype="blastxml" /> + <param name="out_format" value="std" /> + <!-- Note this has some white space differences from the actual blastx output --> + <output name="tabular_file" file="blastx_sample_converted.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastp_human_vs_pdb_seg_no.xml" ftype="blastxml" /> + <param name="out_format" value="std" /> + <!-- Note this has some white space differences from the actual blastp output --> + <output name="tabular_file" file="blastp_human_vs_pdb_seg_no_converted_std.tabular" ftype="tabular" /> + </test> + <test> + <param name="blastxml_file" value="blastp_human_vs_pdb_seg_no.xml" ftype="blastxml" /> + <param name="out_format" value="ext" /> + <!-- Note this has some white space differences from the actual blastp output --> + <output name="tabular_file" file="blastp_human_vs_pdb_seg_no_converted_ext.tabular" ftype="tabular" /> + </test> + </tests> + <help> + +**What it does** + +NCBI BLAST+ (and the older NCBI 'legacy' BLAST) can output in a range of +formats including tabular and a more detailed XML format. A complex workflow +may need both the XML and the tabular output - but running BLAST twice is +slow and wasteful. + +This tool takes the BLAST XML output and by default converts it into the +standard 12 column tabular equivalent: + +====== ========= ============================================ +Column NCBI name Description +------ --------- -------------------------------------------- + 1 qseqid Query Seq-id (ID of your sequence) + 2 sseqid Subject Seq-id (ID of the database hit) + 3 pident Percentage of identical matches + 4 length Alignment length + 5 mismatch Number of mismatches + 6 gapopen Number of gap openings + 7 qstart Start of alignment in query + 8 qend End of alignment in query + 9 sstart Start of alignment in subject (database hit) + 10 send End of alignment in subject (database hit) + 11 evalue Expectation value (E-value) + 12 bitscore Bit score +====== ========= ============================================ + +The BLAST+ tools can optionally output additional columns of information, +but this takes longer to calculate. Most (but not all) of these columns are +included by selecting the extended tabular output. The extra columns are +included *after* the standard 12 columns. This is so that you can write +workflow filtering steps that accept either the 12 or 22 column tabular +BLAST output. + +====== ============= =========================================== +Column NCBI name Description +------ ------------- ------------------------------------------- + 13 sallseqid All subject Seq-id(s), separated by a ';' + 14 score Raw score + 15 nident Number of identical matches + 16 positive Number of positive-scoring matches + 17 gaps Total number of gaps + 18 ppos Percentage of positive-scoring matches + 19 qframe Query frame + 20 sframe Subject frame + 21 qseq Aligned part of query sequence + 22 sseq Aligned part of subject sequence + 23 qlen Query sequence length + 24 slen Subject sequence length +====== ============= =========================================== + +Beware that the XML file (and thus the conversion) and the tabular output +direct from BLAST+ may differ in the presence of XXXX masking on regions +low complexity (columns 21 and 22), and thus also calculated figures like +the percentage idenity (column 3). + + </help> +</tool> |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/hide_stderr.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/hide_stderr.py Thu Aug 23 09:00:40 2012 -0400 |
[ |
@@ -0,0 +1,49 @@ +#!/usr/bin/env python +"""A simple script to redirect stderr to stdout when the return code is zero. + +See https://bitbucket.org/galaxy/galaxy-central/issue/325/ + +Currently Galaxy ignores the return code from command line tools (even if it +is non-zero which by convention indicates an error) and treats any output on +stderr as an error (even though by convention stderr is used for errors or +warnings). + +This script runs the given command line, capturing all stdout and stderr in +memory, and gets the return code. For a zero return code, any stderr (which +should be warnings only) is added to the stdout. That way Galaxy believes +everything is fine. For a non-zero return code, we output stdout as is, and +any stderr, plus the return code to ensure there is some output on stderr. +That way Galaxy treats this as an error. + +Once issue 325 is fixed, this script will not be needed. +""" +import sys +import subprocess + +#Avoid using shell=True when we call subprocess to ensure if the Python +#script is killed, so too is the BLAST process. +try: + words = [] + for w in sys.argv[1:]: + if " " in w: + words.append('"%s"' % w) + else: + words.append(w) + cmd = " ".join(words) + child = subprocess.Popen(sys.argv[1:], + stdout=subprocess.PIPE, stderr=subprocess.PIPE) +except Exception, err: + sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err)) + sys.exit(1) +#Use .communicate as can get deadlocks with .wait(), +stdout, stderr = child.communicate() +return_code = child.returncode + +if return_code: + sys.stdout.write(stdout) + sys.stderr.write(stderr) + sys.stderr.write("Return error code %i from command:\n" % return_code) + sys.stderr.write("%s\n" % cmd) +else: + sys.stdout.write(stdout) + sys.stdout.write(stderr) |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/ncbi_blast_plus.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/ncbi_blast_plus.txt Thu Aug 23 09:00:40 2012 -0400 |
b |
@@ -0,0 +1,81 @@ +Galaxy wrappers for NCBI BLAST+ suite +===================================== + +These wrappers are copyright 2010-2012 by Peter Cock, The James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +See the licence text below. + +Currently tested with NCBI BLAST 2.2.26+ (i.e. version 2.2.26 of BLAST+), +and do not work with the NCBI 'legacy' BLAST suite (e.g. blastall). + +Note that these wrappers were originally distributed as part of the main +Galaxy repository, but as of August 2012 moved to the Galaxy Tool Shed. +My thanks to Dannon Baker from the Galaxy development team for this assistance +with this. + + +Manual Installation +=================== + +For those not using Galaxy's automated installation from the Tool Shed, put +the XML and Python files under tools/ncbi_blast_plus and add the XML files +to your tool_conf.xml as normal. + +You must tell Galaxy about any system level BLAST databases using configuration +files blastdb.loc (nucleotide databases like NT) and blastdb_p.loc (protein +databases like NR). + +You will also need to install the 'blast_datatypes' from the Tool Shed. This +defines the BLAST XML file format ('blastxml'). + + +History +======= + +v0.0.11 - Final revision as part of the Galaxy main repository, and the + first release via the Tool Shed +v0.0.12 - Implements genetic code option for translation searches. + - Changes <parallelism> to 1000 sequences at a time (to cope with + very large sets of queries where BLAST+ can become memory hungry) + - Include warning that BLAST+ with subject FASTA gives pairwise + e-values + + +Developers +========== + +This script and related tools are being developed on the following hg branch: +http://bitbucket.org/peterjc/galaxy-central/src/tools + +For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball I use +the following command from the Galaxy root folder: + +$ ./tools/ncbi_blast_plus/make_ncbi_blast_plus.sh + +This similifies ensuring a consistent set of files is bundled each time, +including all the relevant test files. + + +Licence (MIT/BSD style) +======================= + +Permission to use, copy, modify, and distribute this software and its +documentation with or without modifications and for any purpose and +without fee is hereby granted, provided that any copyright notices +appear in all copies and that both those copyright notices and this +permission notice appear in supporting documentation, and that the +names of the contributors or copyright holders not be used in +advertising or publicity pertaining to distribution of the software +without specific prior permission. + +THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL +WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE +CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT +OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS +OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE +OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE +OR PERFORMANCE OF THIS SOFTWARE. + +NOTE: This is the licence for the Galaxy Wrapper only. BLAST+ and +associated data files are available and licenced separately. |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/ncbi_blastn_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/ncbi_blastn_wrapper.xml Thu Aug 23 09:00:40 2012 -0400 |
b |
b'@@ -0,0 +1,211 @@\n+<tool id="ncbi_blastn_wrapper" name="NCBI BLAST+ blastn" version="0.0.12">\n+ <description>Search nucleotide database with nucleotide query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n+ <version_command>blastn -version</version_command>\n+ <command interpreter="python">hide_stderr.py\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+blastn\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#else:\n+ -subject "$db_opts.subject"\n+#end if\n+-task $blast_type\n+-evalue $evalue_cutoff\n+-out $output1\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+$adv_opts.filter_query\n+$adv_opts.strand\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+$adv_opts.ungapped\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Subject database/sequences">\n+ <option value="db" selected="True">BLAST Database</option>\n+ <option value="file">FASTA file (pairwise e-values)</option>\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Nucleotide BLAST database">\n+ <options from_file="blastdb.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="subject" type="hidden" value="" /> \n+ </when>\n+ <when value="file">\n+ <param name="database" type="hidden" value="" /> \n+ <param name="subject" type="data" format="fasta" label="Nucleotide FASTA file to use as database"/> \n+ </when>\n+ </conditional>\n+ <param name="blast_type" type="select" display="radio" label="Type of BLAST">\n+ <option value="megablast">megablast</option>\n+ <option value="blastn">blastn</option>\n+ <option value="blastn-short">blastn-short</option>\n+ <option value="dc-megablast">dc-megablast</option>\n+ <!-- Using BLAST 2.2.24+ this gives an error:\n+ BLAST engine error: Program type \'vecscreen\' not supported\n+ <option value="vecscreen">vecscreen</option>\n+ -->\n+ </param>\n+ <param name="evalue_cutoff" type="float" size="15" value="0.001" label="Set expectation value cutoff" />\n+ <param name="out_format" type="select" label="Output format">\n+ <option value="6" selected="True">Tabular (standard 12 columns)</option>\n+ <option value="ext">Tabular (extended 24 columns)</option>\n+ <option value="5">BLAST XML</option>\n+ <option value="0">Pairwise text</option>\n+ <option value="0 -html">Pairwise HTML</option>\n+ <option value'..b'>\n+ <when input="out_format" value="0 -html" format="html"/>\n+ <when input="out_format" value="2" format="txt"/>\n+ <when input="out_format" value="2 -html" format="html"/>\n+ <when input="out_format" value="4" format="txt"/>\n+ <when input="out_format" value="4 -html" format="html"/>\n+ <when input="out_format" value="5" format="blastxml"/>\n+ </change_format>\n+ </data>\n+ </outputs>\n+ <requirements>\n+ <requirement type="binary">blastn</requirement>\n+ </requirements>\n+ <help>\n+ \n+.. class:: warningmark\n+\n+**Note**. Database searches may take a substantial amount of time.\n+For large input datasets it is advisable to allow overnight processing. \n+\n+-----\n+\n+**What it does**\n+\n+Search a *nucleotide database* using a *nucleotide query*,\n+using the NCBI BLAST+ blastn command line tool.\n+Algorithms include blastn, megablast, and discontiguous megablast.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a \';\'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Zhang et al. A Greedy Algorithm for Aligning DNA Sequences. 2000. JCB: 203-214.\n+\n+ </help>\n+</tool>\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/ncbi_blastp_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/ncbi_blastp_wrapper.xml Thu Aug 23 09:00:40 2012 -0400 |
b |
b'@@ -0,0 +1,278 @@\n+<tool id="ncbi_blastp_wrapper" name="NCBI BLAST+ blastp" version="0.0.12">\n+ <description>Search protein database with protein query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n+ <version_command>blastp -version</version_command>\n+ <command interpreter="python">hide_stderr.py\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+blastp\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#else:\n+ -subject "$db_opts.subject"\n+#end if\n+-task $blast_type\n+-evalue $evalue_cutoff\n+-out $output1\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+$adv_opts.filter_query\n+-matrix $adv_opts.matrix\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+##Ungapped disabled for now - see comments below\n+##$adv_opts.ungapped\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Protein query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Subject database/sequences">\n+ <option value="db" selected="True">BLAST Database</option>\n+ <option value="file">FASTA file (pairwise e-values)</option>\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Protein BLAST database">\n+ <options from_file="blastdb_p.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="subject" type="hidden" value="" /> \n+ </when>\n+ <when value="file">\n+ <param name="database" type="hidden" value="" /> \n+ <param name="subject" type="data" format="fasta" label="Protein FASTA file to use as database"/> \n+ </when>\n+ </conditional>\n+ <param name="blast_type" type="select" display="radio" label="Type of BLAST">\n+ <option value="blastp">blastp</option>\n+ <option value="blastp-short">blastp-short</option>\n+ </param>\n+ <param name="evalue_cutoff" type="float" size="15" value="0.001" label="Set expectation value cutoff" />\n+ <param name="out_format" type="select" label="Output format">\n+ <option value="6" selected="True">Tabular (standard 12 columns)</option>\n+ <option value="ext">Tabular (extended 24 columns)</option>\n+ <option value="5">BLAST XML</option>\n+ <option value="0">Pairwise text</option>\n+ <option value="0 -html">Pairwise HTML</option>\n+ <option value="2">Query-anchored text</option>\n+ <option value="2 -html">Query-anchored HTML</option>\n+ <option value="4">Flat query-anchored text</option>\n+ <option value="4 -html">Flat query-anchored HTML</option>\n+ <!--\n+ <option value='..b'.fasta" ftype="fasta" />\n+ <param name="database" value="" />\n+ <param name="evalue_cutoff" value="1e-8" />\n+ <param name="blast_type" value="blastp" />\n+ <param name="out_format" value="6" />\n+ <param name="adv_opts_selector" value="basic" />\n+ <output name="output1" file="blastp_rhodopsin_vs_four_human.tabular" ftype="tabular" />\n+ </test>\n+ </tests>\n+ <help>\n+ \n+.. class:: warningmark\n+\n+**Note**. Database searches may take a substantial amount of time.\n+For large input datasets it is advisable to allow overnight processing. \n+\n+-----\n+\n+**What it does**\n+\n+Search a *protein database* using a *protein query*,\n+using the NCBI BLAST+ blastp command line tool.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a \';\'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n+\n+Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005.\n+\n+ </help>\n+</tool>\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/ncbi_blastx_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/ncbi_blastx_wrapper.xml Thu Aug 23 09:00:40 2012 -0400 |
b |
b'@@ -0,0 +1,264 @@\n+<tool id="ncbi_blastx_wrapper" name="NCBI BLAST+ blastx" version="0.0.12">\n+ <description>Search protein database with translated nucleotide query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n+ <version_command>blastx -version</version_command>\n+ <command interpreter="python">hide_stderr.py\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+blastx\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#else:\n+ -subject "$db_opts.subject"\n+#end if\n+-query_gencode $query_gencode\n+-evalue $evalue_cutoff\n+-out $output1\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+$adv_opts.filter_query\n+$adv_opts.strand\n+-matrix $adv_opts.matrix\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+$adv_opts.ungapped\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Subject database/sequences">\n+ <option value="db" selected="True">BLAST Database</option>\n+ <option value="file">FASTA file (pairwise e-values)</option>\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Protein BLAST database">\n+ <options from_file="blastdb_p.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="subject" type="hidden" value="" /> \n+ </when>\n+ <when value="file">\n+ <param name="database" type="hidden" value="" /> \n+ <param name="subject" type="data" format="fasta" label="Protein FASTA file to use as database"/> \n+ </when>\n+ </conditional>\n+ <param name="query_gencode" type="select" label="Query genetic code">\n+ <!-- See http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for details -->\n+ <option value="1" select="True">1. Standard</option>\n+ <option value="2">2. Vertebrate Mitochondrial</option>\n+ <option value="3">3. Yeast Mitochondrial</option>\n+ <option value="4">4. Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code</option>\n+ <option value="5">5. Invertebrate Mitochondrial</option>\n+ <option value="6">6. Ciliate, Dasycladacean and Hexamita Nuclear Code</option>\n+ <option value="9">9. Echinoderm Mitochondrial</option>\n+ <option value="10">10. Euplotid Nuclear</option>\n+ <option value="11">11. Bacteria and Archaea</option>\n+ <option value="12">12. Alternative Yeast Nuclear</option> \n+ <option value="13">13. Ascidian Mitochondrial</option>\n+ <option value="14">'..b' <test>\n+ <param name="query" value="rhodopsin_nucs.fasta" ftype="fasta" />\n+ <param name="db_opts_selector" value="file" />\n+ <param name="subject" value="four_human_proteins.fasta" ftype="fasta" />\n+ <param name="database" value="" />\n+ <param name="evalue_cutoff" value="1e-10" />\n+ <param name="out_format" value="ext" />\n+ <param name="adv_opts_selector" value="basic" />\n+ <output name="output1" file="blastx_rhodopsin_vs_four_human_ext.tabular" ftype="tabular" />\n+ </test>\n+ </tests>\n+ <help>\n+ \n+.. class:: warningmark\n+\n+**Note**. Database searches may take a substantial amount of time.\n+For large input datasets it is advisable to allow overnight processing. \n+\n+-----\n+\n+**What it does**\n+\n+Search a *protein database* using a *translated nucleotide query*,\n+using the NCBI BLAST+ blastx command line tool.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a \';\'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length \n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n+\n+ </help>\n+</tool>\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/ncbi_tblastn_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/ncbi_tblastn_wrapper.xml Thu Aug 23 09:00:40 2012 -0400 |
b |
b'@@ -0,0 +1,310 @@\n+<tool id="ncbi_tblastn_wrapper" name="NCBI BLAST+ tblastn" version="0.0.12">\n+ <description>Search translated nucleotide database with protein query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n+ <version_command>tblastn -version</version_command>\n+ <command interpreter="python">hide_stderr.py\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+tblastn\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#else:\n+ -subject "$db_opts.subject"\n+#end if\n+-evalue $evalue_cutoff\n+-out $output1\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+-db_gencode $adv_opts.db_gencode\n+$adv_opts.filter_query\n+-matrix $adv_opts.matrix\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+##Ungapped disabled for now - see comments below\n+##$adv_opts.ungapped\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Protein query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Subject database/sequences">\n+ <option value="db" selected="True">BLAST Database</option>\n+ <option value="file">FASTA file (pairwise e-values)</option>\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Nucleotide BLAST database">\n+ <options from_file="blastdb.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="subject" type="hidden" value="" /> \n+ </when>\n+ <when value="file">\n+ <param name="database" type="hidden" value="" /> \n+ <param name="subject" type="data" format="fasta" label="Nucleotide FASTA file to use as database"/> \n+ </when>\n+ </conditional>\n+ <param name="evalue_cutoff" type="float" size="15" value="0.001" label="Set expectation value cutoff" />\n+ <param name="out_format" type="select" label="Output format">\n+ <option value="6" selected="True">Tabular (standard 12 columns)</option>\n+ <option value="ext">Tabular (extended 24 columns)</option>\n+ <option value="5">BLAST XML</option>\n+ <option value="0">Pairwise text</option>\n+ <option value="0 -html">Pairwise HTML</option>\n+ <option value="2">Query-anchored text</option>\n+ <option value="2 -html">Query-anchored HTML</option>\n+ <option value="4">Flat query-anchored text</option>\n+ <option value="4 -html">Flat query-anchored HTML</option>\n+ <!--\n+ <option value="-outfmt 11">BLAST archive format (ASN.1)</option>\n+ -->\n+ </param>\n+ <conditional name="adv_opts">\n+ <param name="adv_opts_selector" type="select" '..b'ase" value="" />\n+ <param name="evalue_cutoff" value="1e-10" />\n+ <param name="out_format" value="0 -html" />\n+ <param name="adv_opts_selector" value="advanced" />\n+ <param name="filter_query" value="false" />\n+ <param name="matrix" value="BLOSUM80" />\n+ <param name="max_hits" value="0" />\n+ <param name="word_size" value="0" />\n+ <param name="parse_deflines" value="false" />\n+ <output name="output1" file="tblastn_four_human_vs_rhodopsin.html" ftype="html" />\n+ </test>\n+ </tests>\n+ <help>\n+ \n+.. class:: warningmark\n+\n+**Note**. Database searches may take a substantial amount of time.\n+For large input datasets it is advisable to allow overnight processing. \n+\n+-----\n+\n+**What it does**\n+\n+Search a *translated nucleotide database* using a *protein query*,\n+using the NCBI BLAST+ tblastn command line tool.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a \';\'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n+\n+ </help>\n+</tool>\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/ncbi_tblastx_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/ncbi_tblastx_wrapper.xml Thu Aug 23 09:00:40 2012 -0400 |
b |
b'@@ -0,0 +1,252 @@\n+<tool id="ncbi_tblastx_wrapper" name="NCBI BLAST+ tblastx" version="0.0.12">\n+ <description>Search translated nucleotide database with translated nucleotide query sequence(s)</description>\n+ <!-- If job splitting is enabled, break up the query file into parts -->\n+ <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" shared_inputs="subject" merge_outputs="output1"></parallelism>\n+ <version_command>tblastx -version</version_command>\n+ <command interpreter="python">hide_stderr.py\n+## The command is a Cheetah template which allows some Python based syntax.\n+## Lines starting hash hash are comments. Galaxy will turn newlines into spaces\n+tblastx\n+-query "$query"\n+#if $db_opts.db_opts_selector == "db":\n+ -db "${db_opts.database.fields.path}"\n+#else:\n+ -subject "$db_opts.subject"\n+#end if\n+-query_gencode $query_gencode\n+-evalue $evalue_cutoff\n+-out $output1\n+##Set the extended list here so if/when we add things, saved workflows are not affected\n+#if str($out_format)=="ext":\n+ -outfmt "6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen"\n+#else:\n+ -outfmt $out_format\n+#end if\n+-num_threads 8\n+#if $adv_opts.adv_opts_selector=="advanced":\n+-db_gencode $adv_opts.db_gencode\n+$adv_opts.filter_query\n+$adv_opts.strand\n+-matrix $adv_opts.matrix\n+## Need int(str(...)) because $adv_opts.max_hits is an InputValueWrapper object not a string\n+## Note -max_target_seqs overrides -num_descriptions and -num_alignments\n+#if (str($adv_opts.max_hits) and int(str($adv_opts.max_hits)) > 0):\n+-max_target_seqs $adv_opts.max_hits\n+#end if\n+#if (str($adv_opts.word_size) and int(str($adv_opts.word_size)) > 0):\n+-word_size $adv_opts.word_size\n+#end if\n+$adv_opts.parse_deflines\n+## End of advanced options:\n+#end if\n+ </command>\n+ <inputs>\n+ <param name="query" type="data" format="fasta" label="Nucleotide query sequence(s)"/> \n+ <conditional name="db_opts">\n+ <param name="db_opts_selector" type="select" label="Subject database/sequences">\n+ <option value="db" selected="True">BLAST Database</option>\n+ <option value="file">FASTA file (pairwise e-values)</option>\n+ </param>\n+ <when value="db">\n+ <param name="database" type="select" label="Nucleotide BLAST database">\n+ <options from_file="blastdb.loc">\n+ <column name="value" index="0"/>\n+ <column name="name" index="1"/>\n+ <column name="path" index="2"/>\n+ </options>\n+ </param>\n+ <param name="subject" type="hidden" value="" /> \n+ </when>\n+ <when value="file">\n+ <param name="database" type="hidden" value="" /> \n+ <param name="subject" type="data" format="fasta" label="Nucleotide FASTA file to use as database"/> \n+ </when>\n+ </conditional>\n+ <param name="query_gencode" type="select" label="Query genetic code">\n+ <!-- See http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for details -->\n+ <option value="1" select="True">1. Standard</option>\n+ <option value="2">2. Vertebrate Mitochondrial</option>\n+ <option value="3">3. Yeast Mitochondrial</option>\n+ <option value="4">4. Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code</option>\n+ <option value="5">5. Invertebrate Mitochondrial</option>\n+ <option value="6">6. Ciliate, Dasycladacean and Hexamita Nuclear Code</option>\n+ <option value="9">9. Echinoderm Mitochondrial</option>\n+ <option value="10">10. Euplotid Nuclear</option>\n+ <option value="11">11. Bacteria and Archaea</option>\n+ <option value="12">12. Alternative Yeast Nuclear</option>\n+ <option value="13">13. Ascidian Mitochondrial</option>\n+ '..b'/>\n+ <when input="out_format" value="0 -html" format="html"/>\n+ <when input="out_format" value="2" format="txt"/>\n+ <when input="out_format" value="2 -html" format="html"/>\n+ <when input="out_format" value="4" format="txt"/>\n+ <when input="out_format" value="4 -html" format="html"/>\n+ <when input="out_format" value="5" format="blastxml"/>\n+ </change_format>\n+ </data>\n+ </outputs>\n+ <requirements>\n+ <requirement type="binary">tblastx</requirement>\n+ </requirements>\n+ <help>\n+ \n+.. class:: warningmark\n+\n+**Note**. Database searches may take a substantial amount of time.\n+For large input datasets it is advisable to allow overnight processing. \n+\n+-----\n+\n+**What it does**\n+\n+Search a *translated nucleotide database* using a *protein query*,\n+using the NCBI BLAST+ tblastx command line tool.\n+\n+-----\n+\n+**Output format**\n+\n+Because Galaxy focuses on processing tabular data, the default output of this\n+tool is tabular. The standard BLAST+ tabular output contains 12 columns:\n+\n+====== ========= ============================================\n+Column NCBI name Description\n+------ --------- --------------------------------------------\n+ 1 qseqid Query Seq-id (ID of your sequence)\n+ 2 sseqid Subject Seq-id (ID of the database hit)\n+ 3 pident Percentage of identical matches\n+ 4 length Alignment length\n+ 5 mismatch Number of mismatches\n+ 6 gapopen Number of gap openings\n+ 7 qstart Start of alignment in query\n+ 8 qend End of alignment in query\n+ 9 sstart Start of alignment in subject (database hit)\n+ 10 send End of alignment in subject (database hit)\n+ 11 evalue Expectation value (E-value)\n+ 12 bitscore Bit score\n+====== ========= ============================================\n+\n+The BLAST+ tools can optionally output additional columns of information,\n+but this takes longer to calculate. Most (but not all) of these columns are\n+included by selecting the extended tabular output. The extra columns are\n+included *after* the standard 12 columns. This is so that you can write\n+workflow filtering steps that accept either the 12 or 24 column tabular\n+BLAST output.\n+\n+====== ============= ===========================================\n+Column NCBI name Description\n+------ ------------- -------------------------------------------\n+ 13 sallseqid All subject Seq-id(s), separated by a \';\'\n+ 14 score Raw score\n+ 15 nident Number of identical matches\n+ 16 positive Number of positive-scoring matches\n+ 17 gaps Total number of gaps\n+ 18 ppos Percentage of positive-scoring matches\n+ 19 qframe Query frame\n+ 20 sframe Subject frame\n+ 21 qseq Aligned part of query sequence\n+ 22 sseq Aligned part of subject sequence\n+ 23 qlen Query sequence length\n+ 24 slen Subject sequence length\n+====== ============= ===========================================\n+\n+The third option is BLAST XML output, which is designed to be parsed by\n+another program, and is understood by some Galaxy tools.\n+\n+You can also choose several plain text or HTML output formats which are designed to be read by a person (not by another program).\n+The HTML versions use basic webpage formatting and can include links to the hits on the NCBI website.\n+The pairwise output (the default on the NCBI BLAST website) shows each match as a pairwise alignment with the query.\n+The two query anchored outputs show a multiple sequence alignment between the query and all the matches,\n+and differ in how insertions are shown (marked as insertions or with gap characters added to the other sequences).\n+\n+-------\n+\n+**References**\n+\n+Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402.\n+\n+ </help>\n+</tool>\n' |
b |
diff -r ab1a8640f817 -r 643338ac83c0 tools/ncbi_blast_plus/tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/ncbi_blast_plus/tool_dependencies.xml Thu Aug 23 09:00:40 2012 -0400 |
b |
@@ -0,0 +1,21 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="blast+" version="2.2.26+"> + <install version="1.0"> + <actions> + <action type="download_by_url">ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.26/ncbi-blast-2.2.26+-src.tar.gz</action> + <action type="shell_command">cd c++ && ./configure --prefix=$INSTALL_DIR && make && make install</action> + <action type="set_environment"> + <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable> + </action> + </actions> + </install> + <readme> +These links provide information for building the NCBI Blast+ package in most environments. + +System requirements +http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download + </readme> + </package> +</tool_dependency> + |