comparison tools/ncbi_blast_plus/README.rst @ 13:623f727cdff1 draft

Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
author peterjc
date Fri, 14 Mar 2014 07:40:46 -0400
parents 4c4a0da938ff
children 2fe07f50a41e
comparison
equal deleted inserted replaced
12:6560192c5098 13:623f727cdff1
1 Galaxy wrappers for NCBI BLAST+ suite 1 Galaxy wrappers for NCBI BLAST+ suite
2 ===================================== 2 =====================================
3 3
4 These wrappers are copyright 2010-2013 by Peter Cock, The James Hutton Institute 4 These wrappers are copyright 2010-2013 by Peter Cock (The James Hutton Institute,
5 (formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. 5 UK) and additional contributors. All rights reserved. See the licence text below.
6 See the licence text below.
7 6
8 Currently tested with NCBI BLAST 2.2.28+ (i.e. version 2.2.28 of BLAST+), 7 Currently tested with NCBI BLAST 2.2.28+ (i.e. version 2.2.28 of BLAST+),
9 and does not work with the NCBI 'legacy' BLAST suite (e.g. ``blastall``). 8 and does not work with the NCBI 'legacy' BLAST suite (e.g. ``blastall``).
10 9
11 Note that these wrappers (and the associated datatypes) were originally 10 Note that these wrappers (and the associated datatypes) were originally
24 Galaxy should be able to automatically install the dependencies, i.e. the 23 Galaxy should be able to automatically install the dependencies, i.e. the
25 ``blast_datatypes`` repository which defines the BLAST XML file format 24 ``blast_datatypes`` repository which defines the BLAST XML file format
26 (``blastxml``) and protein and nucleotide BLAST databases (``blastdbp`` and 25 (``blastxml``) and protein and nucleotide BLAST databases (``blastdbp`` and
27 ``blastdbn``). 26 ``blastdbn``).
28 27
29 You must tell Galaxy about any system level BLAST databases using configuration 28 See the configuration notes below.
30 files blastdb.loc (nucleotide databases like NT) and blastdb_p.loc (protein
31 databases like NR), and blastdb_d.loc (protein domain databases like CDD or
32 SMART) which are located in the tool-data/ folder. Sample files are included
33 which explain the tab-based format to use.
34
35 You can download the NCBI provided databases as tar-balls from here:
36
37 * ftp://ftp.ncbi.nlm.nih.gov/blast/db/ (nucleotide and protein databases like NR)
38 * ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ (domain databases like CDD)
39
40 29
41 Manual Installation 30 Manual Installation
42 =================== 31 ===================
43 32
44 For those not using Galaxy's automated installation from the Tool Shed, put 33 For those not using Galaxy's automated installation from the Tool Shed, put
77 Run the functional tests (adjusting the section identifier to match your 66 Run the functional tests (adjusting the section identifier to match your
78 ``tool_conf.xml.sample`` file):: 67 ``tool_conf.xml.sample`` file)::
79 68
80 ./run_functional_tests.sh -sid NCBI_BLAST+-ncbi_blast_plus_tools 69 ./run_functional_tests.sh -sid NCBI_BLAST+-ncbi_blast_plus_tools
81 70
71 Configuration
72 =============
73
74 You must tell Galaxy about any system level BLAST databases using configuration
75 files blastdb.loc (nucleotide databases like NT) and blastdb_p.loc (protein
76 databases like NR), and blastdb_d.loc (protein domain databases like CDD or
77 SMART) which are located in the tool-data/ folder. Sample files are included
78 which explain the tab-based format to use.
79
80 You can download the NCBI provided databases as tar-balls from here:
81
82 * ftp://ftp.ncbi.nlm.nih.gov/blast/db/ (nucleotide and protein databases like NR)
83 * ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/ (domain databases like CDD)
84
85 If using the optional taxonomy columns, you will also need to download the
86 NCBI taxonomy files (``taxdb.btd`` and ``taxdb.bti`` from ``taxdb.tar.gz`` on
87 the BLAST database FTP site). Currently explicit version tracking of the
88 taxonomy is not supported, and in order to use this you must set the
89 ``$BLASTDB`` environment variable to include the path where you unzipped the
90 taxonomy files. If this is not done, the taxonomy columns like species name
91 will appear as ``N/A`` in the tabular output.
92
93 The BLAST+ binaries support multi-threaded operation, which is handled via the
94 $GALAXY_SLOTS environment variable. This should be set automatically by Galaxy
95 via your job runner settings, which allows you to (for example) allocate four
96 cores to each BLAST job.
97
98 In addition, the BLAST+ wrappers also support high level parallelism by task
99 splitting if ``use_tasked_jobs = True`` is enabled in your ``universe_wsgi.ini``
100 configuration file. Essentially, the FASTA input query files are broken up into
101 batches of 1000 sequences, a separate BLAST child job is run for each chunk,
102 and then the BLAST output files are merged (in order). This is transparent
103 for the end user.
82 104
83 History 105 History
84 ======= 106 =======
85 107
86 ======= ====================================================================== 108 ======= ======================================================================
104 'blast_datatypes' repository from the Tool Shed. 126 'blast_datatypes' repository from the Tool Shed.
105 v0.0.17 - The BLAST+ search tools now default to extended tabular output 127 v0.0.17 - The BLAST+ search tools now default to extended tabular output
106 (all too often our users where having to re-run searches just to 128 (all too often our users where having to re-run searches just to
107 get one of the missing columns like query or subject length) 129 get one of the missing columns like query or subject length)
108 v0.0.18 - Defensive quoting of filenames in case of spaces (where possible, 130 v0.0.18 - Defensive quoting of filenames in case of spaces (where possible,
109 BLAST+ handling of some mult-file arguments is problematic). 131 BLAST+ handling of some multi-file arguments is problematic).
110 v0.0.19 - Added wrappers for rpsblast and rpstblastn, and new blastdb_d.loc 132 v0.0.19 - Added wrappers for rpsblast and rpstblastn, and new blastdb_d.loc
111 for the domain databases they use (e.g. CDD, PFAM or SMART). 133 for the domain databases they use (e.g. CDD, PFAM or SMART).
112 - Correct case of exception regular expression (for error handling 134 - Correct case of exception regular expression (for error handling
113 fall-back in case the return code is not set properly). 135 fall-back in case the return code is not set properly).
114 - Clearer naming of output files. 136 - Clearer naming of output files.
120 - Dependency on new package_blast_plus_2_2_26 in Tool Shed. 142 - Dependency on new package_blast_plus_2_2_26 in Tool Shed.
121 - Adopted standard MIT License. 143 - Adopted standard MIT License.
122 - Development moved to GitHub, https://github.com/peterjc/galaxy_blast 144 - Development moved to GitHub, https://github.com/peterjc/galaxy_blast
123 - Updated citation information (Cock et al. 2013). 145 - Updated citation information (Cock et al. 2013).
124 v0.0.21 - Use macros to simplify the XML wrappers. 146 v0.0.21 - Use macros to simplify the XML wrappers.
125 - Added wrapper for dustmasker 147 - Added wrapper for dustmasker.
126 - Enabled masking for makeblastdb 148 - Enabled masking for makeblastdb.
127 - Requires 'maskinfo-asn1' and 'maskinfo-asn1-binary' datatypes 149 - Requires 'maskinfo-asn1' and 'maskinfo-asn1-binary' datatypes.
128 defined in updated blast_datatypes on Galaxy ToolShed. 150 defined in updated blast_datatypes on Galaxy ToolShed.
129 - Tests updated for BLAST+ 2.2.27 instead of BLAST+ 2.2.26 151 - Tests updated for BLAST+ 2.2.27 instead of BLAST+ 2.2.26.
130 - Now depends on package_blast_plus_2_2_27 in ToolShed 152 - Now depends on package_blast_plus_2_2_27 in ToolShed.
131 v0.0.22 - More use macros to simplify the wrappers 153 v0.0.22 - More use macros to simplify the wrappers.
132 - Set number of threads via $GALAXY_SLOTS environment variable 154 - Set number of threads via $GALAXY_SLOTS environment variable.
133 - More descriptive default output names 155 - More descriptive default output names.
134 - Tests require updated BLAST DB definitions (blast_datatypes v0.0.18) 156 - Tests require updated BLAST DB definitions (blast_datatypes v0.0.18).
135 - Pre-check for duplicate identifiers in makeblastdb wrapper. 157 - Pre-check for duplicate identifiers in makeblastdb wrapper.
136 - Tests updated for BLAST+ 2.2.28 instead of BLAST+ 2.2.27 158 - Tests updated for BLAST+ 2.2.28 instead of BLAST+ 2.2.27.
137 - Now depends on package_blast_plus_2_2_28 in ToolShed 159 - Now depends on package_blast_plus_2_2_28 in ToolShed.
138 - Extended tabular output includes 'salltitles' as column 25. 160 - Extended tabular output includes 'salltitles' as column 25.
161 v0.1.00 - Now depends on package_blast_plus_2_2_29 in ToolShed.
162 - Tabular output now includes option to pick specific columns,
163 including previously unavailable taxonomy columns.
164 - BLAST XML to tabular tool supports multiple input files.
165 - More detailed descriptions for BLASTN and BLASTP task option.
166 - Wrappers for segmasker, dustmasker and convert2blastmask.
167 - Supports using maskinfo with makeblastdb wrapper.
168 - Supports setting a taxonomy ID in makeblastdb wrapper.
169 - Subtle changes like new conditional settings will require some old
170 workflows be updated to cope.
139 ======= ====================================================================== 171 ======= ======================================================================
140 172
141 173
142 Bug Reports 174 Bug Reports
143 =========== 175 ===========