What it does
NCBI BLAST+ (and the older NCBI 'legacy' BLAST) can output in a range of formats including tabular and a more detailed XML format. A complex workflow may need both the XML and the tabular output - but running BLAST twice is slow and wasteful.
This tool takes the BLAST XML output and can convert it into the standard 12 column tabular equivalent:
Column | NCBI name | Description |
1 | qseqid | Query Seq-id (ID of your sequence) |
2 | sseqid | Subject Seq-id (ID of the database hit) |
3 | pident | Percentage of identical matches |
4 | length | Alignment length |
5 | mismatch | Number of mismatches |
6 | gapopen | Number of gap openings |
7 | qstart | Start of alignment in query |
8 | qend | End of alignment in query |
9 | sstart | Start of alignment in subject (database hit) |
10 | send | End of alignment in subject (database hit) |
11 | evalue | Expectation value (E-value) |
12 | bitscore | Bit score |
The BLAST+ tools can optionally output additional columns of information, but this takes longer to calculate. Most (but not all) of these columns are included by selecting the extended tabular output. The extra columns are included after the standard 12 columns. This is so that you can write workflow filtering steps that accept either the 12 or 25 column tabular BLAST output. This tool now uses this extended 25 column output by default.
Column | NCBI name | Description |
13 | sallseqid | All subject Seq-id(s), separated by a ';' |
14 | score | Raw score |
15 | nident | Number of identical matches |
16 | positive | Number of positive-scoring matches |
17 | gaps | Total number of gaps |
18 | ppos | Percentage of positive-scoring matches |
19 | qframe | Query frame |
20 | sframe | Subject frame |
21 | qseq | Aligned part of query sequence |
22 | sseq | Aligned part of subject sequence |
23 | qlen | Query sequence length |
24 | slen | Subject sequence length |
25 | salltitles | All subject title(s), separated by a '<>' |
Beware that the XML file (and thus the conversion) and the tabular output direct from BLAST+ may differ in the presence of XXXX masking on regions low complexity (columns 21 and 22), and thus also calculated figures like the percentage identity (column 3).
References
If you use this Galaxy tool in work leading to a scientific publication please cite:
Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013). Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. PeerJ 1:e167 https://doi.org/10.7717/peerj.167
This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus