What it does
NCBI BLAST+ searches can output in a range of formats, but in the past only the XML format included fields like sequence description. This tool converts the BLAST XML report into 12, 24, 26 or custom column tabular
and HTML reports. This tool is loosely based on the ''BLAST XML to tabular'' tool available in main toolshed. For the default 12 and 24 column reports, it should
produce the same output although whitespace differences may exist.
Column | NCBI name | Description |
---|---|---|
1 | qseqid | Query Seq-id (ID of your sequence) |
2 | sseqid | Subject Seq-id (ID of the database hit) |
3 | pident | Percentage of identical matches |
4 | length | Alignment length |
5 | mismatch | Number of mismatches |
6 | gapopen | Number of gap openings |
7 | qstart | Start of alignment in query |
8 | qend | End of alignment in query |
9 | sstart | Start of alignment in subject (database hit) |
10 | send | End of alignment in subject (database hit) |
11 | evalue | Expectation value (E-value) |
12 | bitscore | Bit score |
. | ||
13 | sallseqid | All subject Seq-id(s), separated by a ';' |
14 | score | Raw score |
15 | nident | Number of identical matches |
16 | positive | Number of positive-scoring matches |
17 | gaps | Total number of gaps |
18 | ppos | Percentage of positive-scoring matches |
19 | qframe | Query frame |
20 | sframe | Subject frame |
21 | qseq | Aligned part of query sequence |
22 | sseq | Aligned part of subject sequence |
23 | qlen | Query sequence length |
24 | slen | Subject sequence length |
. | ||
25 | pcov | Percentage coverage |
26 | sallseqdescr | All subject Seq-descr(s), separated by a ',' |
An option also exists to select particular columns for the output report, and to cross-reference each result with one or more reference bins.l A command line version can be used. Type blast_reporting.py -h for help.
python blast_reporting.py in_file out_file out_format [options]
As noted in the original BLAST XML to tabular tool, ''Be aware that the XML file (and thus the conversion) and the tabular output direct from BLAST+ may differ in the presence of XXXX masking on regions low complexity (columns 21 and 22), and thus also calculated figures like the percentage identity (column 3) and gap openings.''
References
If using this tool for publishing results, you may need to cite its origin in the BLAST XML to tabular tool:
Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013). Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. PeerJ 1:e167 http://dx.doi.org/10.7717/peerj.167