annotate tools/ncbi_blast_plus/blastxml_to_tabular.py @ 26:2889433c7ae1 draft

v0.3.3 - fixed legacy dependecy definition
author peterjc
date Sat, 20 Jul 2019 18:36:36 -0400
parents e25d3acf6e68
children a52d2d93e595
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
2 """Convert a BLAST XML file to tabular output.
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
3
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
4 Designed to convert BLAST XML files into tabular BLAST output (either
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
5 std for standard 12 columns, or ext for the extended 25 columns offered
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
6 in the Galaxy BLAST+ wrappers).
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
7
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
8 The 12 columns output are 'qseqid sseqid pident length mismatch gapopen qstart
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
9 qend sstart send evalue bitscore' or 'std' at the BLAST+ command line, which
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
10 mean:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
11
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
12 ====== ========= ============================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
13 Column NCBI name Description
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
14 ------ --------- --------------------------------------------
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
15 1 qseqid Query Seq-id (ID of your sequence)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
16 2 sseqid Subject Seq-id (ID of the database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
17 3 pident Percentage of identical matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
18 4 length Alignment length
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
19 5 mismatch Number of mismatches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
20 6 gapopen Number of gap openings
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
21 7 qstart Start of alignment in query
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
22 8 qend End of alignment in query
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
23 9 sstart Start of alignment in subject (database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
24 10 send End of alignment in subject (database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
25 11 evalue Expectation value (E-value)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
26 12 bitscore Bit score
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
27 ====== ========= ============================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
28
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
29 The additional columns offered in the Galaxy BLAST+ wrappers are:
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
30
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
31 ====== ============= ===========================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
32 Column NCBI name Description
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
33 ------ ------------- -------------------------------------------
11
4c4a0da938ff Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents: 10
diff changeset
34 13 sallseqid All subject Seq-id(s), separated by ';'
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
35 14 score Raw score
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
36 15 nident Number of identical matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
37 16 positive Number of positive-scoring matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
38 17 gaps Total number of gaps
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
39 18 ppos Percentage of positive-scoring matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
40 19 qframe Query frame
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
41 20 sframe Subject frame
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
42 21 qseq Aligned part of query sequence
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
43 22 sseq Aligned part of subject sequence
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
44 23 qlen Query sequence length
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
45 24 slen Subject sequence length
11
4c4a0da938ff Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents: 10
diff changeset
46 25 salltitles All subject titles, separated by '<>'
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
47 ====== ============= ===========================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
48
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
49 Most of these fields are given explicitly in the XML file, others some like
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
50 the percentage identity and the number of gap openings must be calculated.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
51
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
52 Be aware that the sequence in the extended tabular output or XML direct from
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
53 BLAST+ may or may not use XXXX masking on regions of low complexity. This
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
54 can throw the off the calculation of percentage identity and gap openings.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
55 [In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard,
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
56 with these numbers changing depending on whether or not the low complexity
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
57 filter is used.]
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
58
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
59 This script attempts to produce identical output to what BLAST+ would have done.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
60 However, check this with "diff -b ..." since BLAST+ sometimes includes an extra
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
61 space character (probably a bug).
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
62 """
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
63
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
64 from __future__ import print_function
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
65
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
66 import os
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
67 import re
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
68 import sys
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
69
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
70 from optparse import OptionParser
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
71
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
72 if "-v" in sys.argv or "--version" in sys.argv:
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
73 print("v0.2.01")
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
74 sys.exit(0)
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
75
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
76 if sys.version_info[:2] >= (2, 5):
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
77 try:
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
78 from xml.etree import cElementTree as ElementTree
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
79 except ImportError:
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
80 from xml.etree import ElementTree as ElementTree
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
81 else:
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
82 from galaxy import eggs # noqa - ignore flake8 F401
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
83 import pkg_resources
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
84
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
85 pkg_resources.require("elementtree")
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
86 from elementtree import ElementTree
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
87
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
88 if len(sys.argv) == 4 and sys.argv[3] in ["std", "x22", "ext"]:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
89 # False positive if user really has a BLAST XML file called 'std' or 'ext'...
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
90 sys.exit(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
91 """ERROR: The script API has changed, sorry.
14
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
92
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
93 Instead of the old style:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
94
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
95 $ python blastxml_to_tabular.py input.xml output.tabular std
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
96
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
97 Please use:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
98
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
99 $ python blastxml_to_tabular.py -o output.tabular -c std input.xml
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
100
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
101 For more information, use:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
102
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
103 $ python blastxml_to_tabular.py -h
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
104 """
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
105 )
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
106
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
107 usage = """usage: %prog [options] blastxml[,...]
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
108
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
109 Convert one (or more) BLAST XML files into a single tabular file.
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
110
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
111 The columns option can be 'std' (standard 12 columns), 'ext'
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
112 (extended 25 columns), or a list of BLAST+ column names like
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
113 'qseqid,sseqid,pident' (space or comma separated).
23
31e517610e1f v0.3.0 Updated for NCBI BLAST+ 2.7.1
peterjc
parents: 22
diff changeset
114
31e517610e1f v0.3.0 Updated for NCBI BLAST+ 2.7.1
peterjc
parents: 22
diff changeset
115 Note if using a list of column names, currently ONLY the 25
31e517610e1f v0.3.0 Updated for NCBI BLAST+ 2.7.1
peterjc
parents: 22
diff changeset
116 extended column names are supported.
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
117 """
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
118 parser = OptionParser(usage=usage)
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
119 parser.add_option(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
120 "-o",
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
121 "--output",
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
122 dest="output",
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
123 default=None,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
124 help="output filename (defaults to stdout)",
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
125 metavar="FILE",
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
126 )
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
127 parser.add_option(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
128 "-c",
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
129 "--columns",
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
130 dest="columns",
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
131 default="std",
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
132 help="[std|ext|col1,col2,...] standard 12 columns, "
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
133 "extended 25 columns, or list of column names",
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
134 )
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
135 (options, args) = parser.parse_args()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
136
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
137 colnames = (
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
138 "qseqid,sseqid,pident,length,mismatch,gapopen,qstart,qend,"
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
139 "sstart,send,evalue,bitscore,sallseqid,score,nident,positive,"
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
140 "gaps,ppos,qframe,sframe,qseq,sseq,qlen,slen,salltitles"
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
141 ).split(",")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
142
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
143 if len(args) < 1:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
144 sys.exit("ERROR: No BLASTXML input files given; run with --help to see options.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
145
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
146 out_fmt = options.columns
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
147 if out_fmt == "std":
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
148 extended = False
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
149 cols = None
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
150 elif out_fmt == "x22":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
151 sys.exit("Format argument x22 has been replaced with ext (extended 25 columns)")
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
152 elif out_fmt == "ext":
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
153 extended = True
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
154 cols = None
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
155 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
156 cols = out_fmt.replace(" ", ",").split(",") # Allow space or comma separated
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
157 # Remove any blank entries due to trailing comma,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
158 # or annoying "None" dummy value from Galaxy if no columns
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
159 cols = [c for c in cols if c and c != "None"]
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
160 extra = set(cols).difference(colnames)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
161 if extra:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
162 sys.exit("These are not recognised column names: %s" % ",".join(sorted(extra)))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
163 del extra
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
164 assert set(colnames).issuperset(cols), cols
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
165 if not cols:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
166 sys.exit("No columns selected!")
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
167 extended = (
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
168 max(colnames.index(c) for c in cols) >= 12
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
169 ) # Do we need any higher columns?
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
170 del out_fmt
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
171
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
172 for in_file in args:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
173 if not os.path.isfile(in_file):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
174 sys.exit("Input BLAST XML file not found: %s" % in_file)
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
175
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
176
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
177 re_default_query_id = re.compile(r"^Query_\d+$")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
178 assert re_default_query_id.match(r"Query_101")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
179 assert not re_default_query_id.match(r"Query_101a")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
180 assert not re_default_query_id.match(r"MyQuery_101")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
181 re_default_subject_id = re.compile(r"^Subject_\d+$")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
182 assert re_default_subject_id.match(r"Subject_1")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
183 assert not re_default_subject_id.match(r"Subject_")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
184 assert not re_default_subject_id.match(r"Subject_12a")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
185 assert not re_default_subject_id.match(r"TheSubject_1")
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
186
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
187
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
188 def convert(blastxml_filename, output_handle):
25
e25d3acf6e68 v0.3.1 completed gzip support
peterjc
parents: 24
diff changeset
189 """Convert BLAST XML input from a file to tabular on given handle."""
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
190 blast_program = None
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
191 # get an iterable
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
192 try:
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
193 context = ElementTree.iterparse(blastxml_filename, events=("start", "end"))
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
194 except Exception:
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
195 sys.exit("Invalid data format.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
196 # turn it into an iterator
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
197 context = iter(context)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
198 # get the root element
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
199 try:
24
c877294f8025 Fixed tool_dependencies.xml
peterjc
parents: 23
diff changeset
200 event, root = next(context)
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
201 except Exception:
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
202 sys.exit("Invalid data format.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
203 for event, elem in context:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
204 if event == "end" and elem.tag == "BlastOutput_program":
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
205 blast_program = elem.text
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
206 # for every <Iteration> tag
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
207 if event == "end" and elem.tag == "Iteration":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
208 # Expecting either this, from BLAST 2.2.25+ using FASTA vs FASTA
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
209 # <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
210 # <Iteration_query-def>Endoplasmic reticulum resident protein 44
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
211 # OS=Homo sapiens GN=ERP44 PE=1 SV=1</Iteration_query-def>
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
212 # <Iteration_query-len>406</Iteration_query-len>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
213 # <Iteration_hits></Iteration_hits>
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
214 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
215 # Or, from BLAST 2.2.24+ run online
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
216 # <Iteration_query-ID>Query_1</Iteration_query-ID>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
217 # <Iteration_query-def>Sample</Iteration_query-def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
218 # <Iteration_query-len>516</Iteration_query-len>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
219 # <Iteration_hits>...
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
220 qseqid = elem.findtext("Iteration_query-ID")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
221 if re_default_query_id.match(qseqid):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
222 # Place holder ID, take the first word of the query definition
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
223 qseqid = elem.findtext("Iteration_query-def").split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
224 qlen = int(elem.findtext("Iteration_query-len"))
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
225
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
226 # for every <Hit> within <Iteration>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
227 for hit in elem.findall("Iteration_hits/Hit"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
228 # Expecting either this,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
229 # <Hit_id>gi|3024260|sp|P56514.1|OPSD_BUFBU</Hit_id>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
230 # <Hit_def>RecName: Full=Rhodopsin</Hit_def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
231 # <Hit_accession>P56514</Hit_accession>
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
232 # or,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
233 # <Hit_id>Subject_1</Hit_id>
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
234 # <Hit_def>gi|57163783|ref|NP_001009242.1|
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
235 # rhodopsin [Felis catus]</Hit_def>
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
236 # <Hit_accession>Subject_1</Hit_accession>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
237 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
238 # apparently depending on the parse_deflines switch
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
239 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
240 # Or, with a local database not using -parse_seqids can get this,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
241 # <Hit_id>gnl|BL_ORD_ID|2</Hit_id>
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
242 # <Hit_def>chrIII gi|240255695|ref|NC_003074.8| Arabidopsis
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
243 # thaliana chromosome 3, complete sequence</Hit_def>
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
244 # <Hit_accession>2</Hit_accession>
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
245 sseqid = hit.findtext("Hit_id").split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
246 hit_def = sseqid + " " + hit.findtext("Hit_def")
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
247 if re_default_subject_id.match(sseqid) and sseqid == hit.findtext(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
248 "Hit_accession"
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
249 ):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
250 # Place holder ID, take the first word of the subject definition
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
251 hit_def = hit.findtext("Hit_def")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
252 sseqid = hit_def.split(None, 1)[0]
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
253 if sseqid.startswith(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
254 "gnl|BL_ORD_ID|"
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
255 ) and sseqid == "gnl|BL_ORD_ID|" + hit.findtext("Hit_accession"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
256 # Alternative place holder ID, again take the first word of hit_def
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
257 hit_def = hit.findtext("Hit_def")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
258 sseqid = hit_def.split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
259 # for every <Hsp> within <Hit>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
260 for hsp in hit.findall("Hit_hsps/Hsp"):
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
261 nident = hsp.findtext("Hsp_identity")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
262 length = hsp.findtext("Hsp_align-len")
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
263 # As of NCBI BLAST+ 2.4.0 this is given to 3dp (not 2dp)
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
264 pident = "%0.3f" % (100 * float(nident) / float(length))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
265
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
266 q_seq = hsp.findtext("Hsp_qseq")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
267 h_seq = hsp.findtext("Hsp_hseq")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
268 m_seq = hsp.findtext("Hsp_midline")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
269 assert len(q_seq) == len(h_seq) == len(m_seq) == int(length)
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
270 gapopen = str(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
271 len(q_seq.replace("-", " ").split())
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
272 - 1
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
273 + len(h_seq.replace("-", " ").split())
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
274 - 1
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
275 )
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
276
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
277 mismatch = (
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
278 m_seq.count(" ")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
279 + m_seq.count("+")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
280 - q_seq.count("-")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
281 - h_seq.count("-")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
282 )
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
283 # TODO - Remove this alternative mismatch calculation and test
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
284 # once satisifed there are no problems
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
285 expected_mismatch = len(q_seq) - sum(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
286 1
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
287 for q, h in zip(q_seq, h_seq)
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
288 if q == h or q == "-" or h == "-"
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
289 )
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
290 xx = sum(1 for q, h in zip(q_seq, h_seq) if q == "X" and h == "X")
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
291 if not (
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
292 expected_mismatch - q_seq.count("X")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
293 <= int(mismatch)
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
294 <= expected_mismatch + xx
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
295 ):
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
296 sys.exit(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
297 "%s vs %s mismatches, expected %i <= %i <= %i"
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
298 % (
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
299 qseqid,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
300 sseqid,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
301 expected_mismatch - q_seq.count("X"),
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
302 int(mismatch),
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
303 expected_mismatch,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
304 )
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
305 )
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
306
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
307 # TODO - Remove this alternative identity calculation and test
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
308 # once satisifed there are no problems
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
309 expected_identity = sum(1 for q, h in zip(q_seq, h_seq) if q == h)
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
310 if not (
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
311 expected_identity - xx
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
312 <= int(nident)
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
313 <= expected_identity + q_seq.count("X")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
314 ):
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
315 sys.exit(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
316 "%s vs %s identities, expected %i <= %i <= %i"
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
317 % (
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
318 qseqid,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
319 sseqid,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
320 expected_identity,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
321 int(nident),
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
322 expected_identity + q_seq.count("X"),
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
323 )
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
324 )
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
325
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
326 evalue = hsp.findtext("Hsp_evalue")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
327 if evalue == "0":
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
328 evalue = "0.0"
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
329 else:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
330 evalue = "%0.0e" % float(evalue)
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
331
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
332 bitscore = float(hsp.findtext("Hsp_bit-score"))
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
333 if bitscore < 100:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
334 # Seems to show one decimal place for lower scores
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
335 bitscore = "%0.1f" % bitscore
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
336 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
337 # Note BLAST does not round to nearest int, it truncates
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
338 bitscore = "%i" % bitscore
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
339
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
340 values = [
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
341 qseqid,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
342 sseqid,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
343 pident,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
344 length, # hsp.findtext("Hsp_align-len")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
345 str(mismatch),
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
346 gapopen,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
347 hsp.findtext("Hsp_query-from"), # qstart,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
348 hsp.findtext("Hsp_query-to"), # qend,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
349 hsp.findtext("Hsp_hit-from"), # sstart,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
350 hsp.findtext("Hsp_hit-to"), # send,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
351 evalue, # hsp.findtext("Hsp_evalue") in scientific notation
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
352 bitscore, # hsp.findtext("Hsp_bit-score") rounded
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
353 ]
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
354
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
355 if extended:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
356 try:
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
357 sallseqid = ";".join(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
358 name.split(None, 1)[0] for name in hit_def.split(" >")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
359 )
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
360 salltitles = "<>".join(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
361 name.split(None, 1)[1] for name in hit_def.split(" >")
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
362 )
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
363 except IndexError as e:
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
364 sys.exit(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
365 "Problem splitting multuple hits?\n%r\n--> %s"
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
366 % (hit_def, e)
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
367 )
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
368 # print(hit_def, "-->", sallseqid)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
369 positive = hsp.findtext("Hsp_positive")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
370 ppos = "%0.2f" % (100 * float(positive) / float(length))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
371 qframe = hsp.findtext("Hsp_query-frame")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
372 sframe = hsp.findtext("Hsp_hit-frame")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
373 if blast_program == "blastp":
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
374 # Probably a bug in BLASTP that they use 0 or 1
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
375 # depending on format
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
376 if qframe == "0":
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
377 qframe = "1"
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
378 if sframe == "0":
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
379 sframe = "1"
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
380 slen = int(hit.findtext("Hit_len"))
26
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
381 values.extend(
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
382 [
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
383 sallseqid,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
384 hsp.findtext("Hsp_score"), # score,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
385 nident,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
386 positive,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
387 hsp.findtext("Hsp_gaps"), # gaps,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
388 ppos,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
389 qframe,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
390 sframe,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
391 # NOTE - for blastp, XML shows original seq,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
392 # tabular uses XXX masking
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
393 q_seq,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
394 h_seq,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
395 str(qlen),
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
396 str(slen),
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
397 salltitles,
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
398 ]
2889433c7ae1 v0.3.3 - fixed legacy dependecy definition
peterjc
parents: 25
diff changeset
399 )
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
400 if cols:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
401 # Only a subset of the columns are needed
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
402 values = [values[colnames.index(c)] for c in cols]
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
403 # print("\t".join(values))
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
404 output_handle.write("\t".join(values) + "\n")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
405 # prevents ElementTree from growing large datastructure
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
406 root.clear()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
407 elem.clear()
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
408
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
409
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
410 if options.output:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
411 outfile = open(options.output, "w")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
412 else:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
413 outfile = sys.stdout
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
414
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
415 for in_file in args:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
416 blast_program = None
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
417 convert(in_file, outfile)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
418
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
419 if options.output:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
420 outfile.close()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
421 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
422 # Using stdout
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
423 pass