annotate tools/ncbi_blast_plus/blastxml_to_tabular.py @ 25:e25d3acf6e68 draft

v0.3.1 completed gzip support
author peterjc
date Tue, 23 Oct 2018 08:48:19 -0400
parents c877294f8025
children 2889433c7ae1
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
2 """Convert a BLAST XML file to tabular output.
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
3
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
4 Designed to convert BLAST XML files into tabular BLAST output (either
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
5 std for standard 12 columns, or ext for the extended 25 columns offered
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
6 in the Galaxy BLAST+ wrappers).
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
7
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
8 The 12 columns output are 'qseqid sseqid pident length mismatch gapopen qstart
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
9 qend sstart send evalue bitscore' or 'std' at the BLAST+ command line, which
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
10 mean:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
11
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
12 ====== ========= ============================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
13 Column NCBI name Description
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
14 ------ --------- --------------------------------------------
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
15 1 qseqid Query Seq-id (ID of your sequence)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
16 2 sseqid Subject Seq-id (ID of the database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
17 3 pident Percentage of identical matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
18 4 length Alignment length
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
19 5 mismatch Number of mismatches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
20 6 gapopen Number of gap openings
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
21 7 qstart Start of alignment in query
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
22 8 qend End of alignment in query
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
23 9 sstart Start of alignment in subject (database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
24 10 send End of alignment in subject (database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
25 11 evalue Expectation value (E-value)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
26 12 bitscore Bit score
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
27 ====== ========= ============================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
28
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
29 The additional columns offered in the Galaxy BLAST+ wrappers are:
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
30
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
31 ====== ============= ===========================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
32 Column NCBI name Description
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
33 ------ ------------- -------------------------------------------
11
4c4a0da938ff Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents: 10
diff changeset
34 13 sallseqid All subject Seq-id(s), separated by ';'
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
35 14 score Raw score
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
36 15 nident Number of identical matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
37 16 positive Number of positive-scoring matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
38 17 gaps Total number of gaps
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
39 18 ppos Percentage of positive-scoring matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
40 19 qframe Query frame
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
41 20 sframe Subject frame
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
42 21 qseq Aligned part of query sequence
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
43 22 sseq Aligned part of subject sequence
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
44 23 qlen Query sequence length
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
45 24 slen Subject sequence length
11
4c4a0da938ff Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents: 10
diff changeset
46 25 salltitles All subject titles, separated by '<>'
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
47 ====== ============= ===========================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
48
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
49 Most of these fields are given explicitly in the XML file, others some like
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
50 the percentage identity and the number of gap openings must be calculated.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
51
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
52 Be aware that the sequence in the extended tabular output or XML direct from
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
53 BLAST+ may or may not use XXXX masking on regions of low complexity. This
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
54 can throw the off the calculation of percentage identity and gap openings.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
55 [In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard,
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
56 with these numbers changing depending on whether or not the low complexity
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
57 filter is used.]
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
58
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
59 This script attempts to produce identical output to what BLAST+ would have done.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
60 However, check this with "diff -b ..." since BLAST+ sometimes includes an extra
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
61 space character (probably a bug).
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
62 """
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
63
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
64 from __future__ import print_function
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
65
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
66 import os
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
67 import re
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
68 import sys
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
69
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
70 from optparse import OptionParser
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
71
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
72 if "-v" in sys.argv or "--version" in sys.argv:
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
73 print("v0.2.01")
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
74 sys.exit(0)
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
75
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
76 if sys.version_info[:2] >= (2, 5):
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
77 try:
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
78 from xml.etree import cElementTree as ElementTree
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
79 except ImportError:
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
80 from xml.etree import ElementTree as ElementTree
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
81 else:
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
82 from galaxy import eggs # noqa - ignore flake8 F401
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
83 import pkg_resources
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
84 pkg_resources.require("elementtree")
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
85 from elementtree import ElementTree
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
86
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
87 if len(sys.argv) == 4 and sys.argv[3] in ["std", "x22", "ext"]:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
88 # False positive if user really has a BLAST XML file called 'std' or 'ext'...
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
89 sys.exit("""ERROR: The script API has changed, sorry.
14
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
90
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
91 Instead of the old style:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
92
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
93 $ python blastxml_to_tabular.py input.xml output.tabular std
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
94
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
95 Please use:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
96
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
97 $ python blastxml_to_tabular.py -o output.tabular -c std input.xml
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
98
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
99 For more information, use:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
100
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
101 $ python blastxml_to_tabular.py -h
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
102 """)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
103
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
104 usage = """usage: %prog [options] blastxml[,...]
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
105
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
106 Convert one (or more) BLAST XML files into a single tabular file.
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
107
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
108 The columns option can be 'std' (standard 12 columns), 'ext'
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
109 (extended 25 columns), or a list of BLAST+ column names like
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
110 'qseqid,sseqid,pident' (space or comma separated).
23
31e517610e1f v0.3.0 Updated for NCBI BLAST+ 2.7.1
peterjc
parents: 22
diff changeset
111
31e517610e1f v0.3.0 Updated for NCBI BLAST+ 2.7.1
peterjc
parents: 22
diff changeset
112 Note if using a list of column names, currently ONLY the 25
31e517610e1f v0.3.0 Updated for NCBI BLAST+ 2.7.1
peterjc
parents: 22
diff changeset
113 extended column names are supported.
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
114 """
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
115 parser = OptionParser(usage=usage)
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
116 parser.add_option('-o', '--output', dest='output', default=None,
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
117 help='output filename (defaults to stdout)',
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
118 metavar="FILE")
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
119 parser.add_option("-c", "--columns", dest="columns", default='std',
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
120 help="[std|ext|col1,col2,...] standard 12 columns, extended 25 columns, or list of column names")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
121 (options, args) = parser.parse_args()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
122
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
123 colnames = ('qseqid,sseqid,pident,length,mismatch,gapopen,qstart,qend,'
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
124 'sstart,send,evalue,bitscore,sallseqid,score,nident,positive,'
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
125 'gaps,ppos,qframe,sframe,qseq,sseq,qlen,slen,salltitles').split(',')
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
126
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
127 if len(args) < 1:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
128 sys.exit("ERROR: No BLASTXML input files given; run with --help to see options.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
129
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
130 out_fmt = options.columns
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
131 if out_fmt == "std":
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
132 extended = False
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
133 cols = None
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
134 elif out_fmt == "x22":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
135 sys.exit("Format argument x22 has been replaced with ext (extended 25 columns)")
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
136 elif out_fmt == "ext":
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
137 extended = True
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
138 cols = None
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
139 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
140 cols = out_fmt.replace(" ", ",").split(",") # Allow space or comma separated
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
141 # Remove any blank entries due to trailing comma,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
142 # or annoying "None" dummy value from Galaxy if no columns
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
143 cols = [c for c in cols if c and c != "None"]
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
144 extra = set(cols).difference(colnames)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
145 if extra:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
146 sys.exit("These are not recognised column names: %s" % ",".join(sorted(extra)))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
147 del extra
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
148 assert set(colnames).issuperset(cols), cols
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
149 if not cols:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
150 sys.exit("No columns selected!")
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
151 extended = max(colnames.index(c) for c in cols) >= 12 # Do we need any higher columns?
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
152 del out_fmt
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
153
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
154 for in_file in args:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
155 if not os.path.isfile(in_file):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
156 sys.exit("Input BLAST XML file not found: %s" % in_file)
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
157
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
158
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
159 re_default_query_id = re.compile("^Query_\d+$")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
160 assert re_default_query_id.match("Query_101")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
161 assert not re_default_query_id.match("Query_101a")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
162 assert not re_default_query_id.match("MyQuery_101")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
163 re_default_subject_id = re.compile("^Subject_\d+$")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
164 assert re_default_subject_id.match("Subject_1")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
165 assert not re_default_subject_id.match("Subject_")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
166 assert not re_default_subject_id.match("Subject_12a")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
167 assert not re_default_subject_id.match("TheSubject_1")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
168
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
169
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
170 def convert(blastxml_filename, output_handle):
25
e25d3acf6e68 v0.3.1 completed gzip support
peterjc
parents: 24
diff changeset
171 """Convert BLAST XML input from a file to tabular on given handle."""
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
172 blast_program = None
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
173 # get an iterable
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
174 try:
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
175 context = ElementTree.iterparse(blastxml_filename, events=("start", "end"))
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
176 except Exception:
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
177 sys.exit("Invalid data format.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
178 # turn it into an iterator
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
179 context = iter(context)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
180 # get the root element
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
181 try:
24
c877294f8025 Fixed tool_dependencies.xml
peterjc
parents: 23
diff changeset
182 event, root = next(context)
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
183 except Exception:
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
184 sys.exit("Invalid data format.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
185 for event, elem in context:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
186 if event == "end" and elem.tag == "BlastOutput_program":
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
187 blast_program = elem.text
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
188 # for every <Iteration> tag
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
189 if event == "end" and elem.tag == "Iteration":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
190 # Expecting either this, from BLAST 2.2.25+ using FASTA vs FASTA
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
191 # <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
192 # <Iteration_query-def>Endoplasmic reticulum resident protein 44
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
193 # OS=Homo sapiens GN=ERP44 PE=1 SV=1</Iteration_query-def>
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
194 # <Iteration_query-len>406</Iteration_query-len>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
195 # <Iteration_hits></Iteration_hits>
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
196 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
197 # Or, from BLAST 2.2.24+ run online
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
198 # <Iteration_query-ID>Query_1</Iteration_query-ID>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
199 # <Iteration_query-def>Sample</Iteration_query-def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
200 # <Iteration_query-len>516</Iteration_query-len>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
201 # <Iteration_hits>...
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
202 qseqid = elem.findtext("Iteration_query-ID")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
203 if re_default_query_id.match(qseqid):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
204 # Place holder ID, take the first word of the query definition
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
205 qseqid = elem.findtext("Iteration_query-def").split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
206 qlen = int(elem.findtext("Iteration_query-len"))
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
207
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
208 # for every <Hit> within <Iteration>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
209 for hit in elem.findall("Iteration_hits/Hit"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
210 # Expecting either this,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
211 # <Hit_id>gi|3024260|sp|P56514.1|OPSD_BUFBU</Hit_id>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
212 # <Hit_def>RecName: Full=Rhodopsin</Hit_def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
213 # <Hit_accession>P56514</Hit_accession>
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
214 # or,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
215 # <Hit_id>Subject_1</Hit_id>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
216 # <Hit_def>gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus]</Hit_def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
217 # <Hit_accession>Subject_1</Hit_accession>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
218 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
219 # apparently depending on the parse_deflines switch
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
220 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
221 # Or, with a local database not using -parse_seqids can get this,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
222 # <Hit_id>gnl|BL_ORD_ID|2</Hit_id>
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
223 # <Hit_def>chrIII gi|240255695|ref|NC_003074.8| Arabidopsis
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
224 # thaliana chromosome 3, complete sequence</Hit_def>
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
225 # <Hit_accession>2</Hit_accession>
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
226 sseqid = hit.findtext("Hit_id").split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
227 hit_def = sseqid + " " + hit.findtext("Hit_def")
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
228 if re_default_subject_id.match(sseqid) and sseqid == hit.findtext("Hit_accession"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
229 # Place holder ID, take the first word of the subject definition
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
230 hit_def = hit.findtext("Hit_def")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
231 sseqid = hit_def.split(None, 1)[0]
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
232 if sseqid.startswith("gnl|BL_ORD_ID|") and sseqid == "gnl|BL_ORD_ID|" + hit.findtext("Hit_accession"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
233 # Alternative place holder ID, again take the first word of hit_def
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
234 hit_def = hit.findtext("Hit_def")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
235 sseqid = hit_def.split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
236 # for every <Hsp> within <Hit>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
237 for hsp in hit.findall("Hit_hsps/Hsp"):
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
238 nident = hsp.findtext("Hsp_identity")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
239 length = hsp.findtext("Hsp_align-len")
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
240 # As of NCBI BLAST+ 2.4.0 this is given to 3dp (not 2dp)
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
241 pident = "%0.3f" % (100 * float(nident) / float(length))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
242
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
243 q_seq = hsp.findtext("Hsp_qseq")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
244 h_seq = hsp.findtext("Hsp_hseq")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
245 m_seq = hsp.findtext("Hsp_midline")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
246 assert len(q_seq) == len(h_seq) == len(m_seq) == int(length)
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
247 gapopen = str(len(q_seq.replace('-', ' ').split()) - 1 +
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
248 len(h_seq.replace('-', ' ').split()) - 1)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
249
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
250 mismatch = m_seq.count(' ') + m_seq.count('+') - q_seq.count('-') - h_seq.count('-')
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
251 # TODO - Remove this alternative mismatch calculation and test
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
252 # once satisifed there are no problems
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
253 expected_mismatch = len(q_seq) - sum(1 for q, h in zip(q_seq, h_seq)
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
254 if q == h or q == "-" or h == "-")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
255 xx = sum(1 for q, h in zip(q_seq, h_seq) if q == "X" and h == "X")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
256 if not (expected_mismatch - q_seq.count("X") <= int(mismatch) <= expected_mismatch + xx):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
257 sys.exit("%s vs %s mismatches, expected %i <= %i <= %i"
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
258 % (qseqid, sseqid, expected_mismatch - q_seq.count("X"),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
259 int(mismatch), expected_mismatch))
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
260
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
261 # TODO - Remove this alternative identity calculation and test
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
262 # once satisifed there are no problems
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
263 expected_identity = sum(1 for q, h in zip(q_seq, h_seq) if q == h)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
264 if not (expected_identity - xx <= int(nident) <= expected_identity + q_seq.count("X")):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
265 sys.exit("%s vs %s identities, expected %i <= %i <= %i"
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
266 % (qseqid, sseqid, expected_identity, int(nident),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
267 expected_identity + q_seq.count("X")))
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
268
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
269 evalue = hsp.findtext("Hsp_evalue")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
270 if evalue == "0":
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
271 evalue = "0.0"
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
272 else:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
273 evalue = "%0.0e" % float(evalue)
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
274
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
275 bitscore = float(hsp.findtext("Hsp_bit-score"))
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
276 if bitscore < 100:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
277 # Seems to show one decimal place for lower scores
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
278 bitscore = "%0.1f" % bitscore
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
279 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
280 # Note BLAST does not round to nearest int, it truncates
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
281 bitscore = "%i" % bitscore
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
282
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
283 values = [qseqid,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
284 sseqid,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
285 pident,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
286 length, # hsp.findtext("Hsp_align-len")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
287 str(mismatch),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
288 gapopen,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
289 hsp.findtext("Hsp_query-from"), # qstart,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
290 hsp.findtext("Hsp_query-to"), # qend,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
291 hsp.findtext("Hsp_hit-from"), # sstart,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
292 hsp.findtext("Hsp_hit-to"), # send,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
293 evalue, # hsp.findtext("Hsp_evalue") in scientific notation
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
294 bitscore, # hsp.findtext("Hsp_bit-score") rounded
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
295 ]
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
296
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
297 if extended:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
298 try:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
299 sallseqid = ";".join(name.split(None, 1)[0] for name in hit_def.split(" >"))
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
300 salltitles = "<>".join(name.split(None, 1)[1] for name in hit_def.split(" >"))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
301 except IndexError as e:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
302 sys.exit("Problem splitting multuple hits?\n%r\n--> %s" % (hit_def, e))
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
303 # print(hit_def, "-->", sallseqid)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
304 positive = hsp.findtext("Hsp_positive")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
305 ppos = "%0.2f" % (100 * float(positive) / float(length))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
306 qframe = hsp.findtext("Hsp_query-frame")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
307 sframe = hsp.findtext("Hsp_hit-frame")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
308 if blast_program == "blastp":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
309 # Probably a bug in BLASTP that they use 0 or 1 depending on format
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
310 if qframe == "0":
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
311 qframe = "1"
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
312 if sframe == "0":
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
313 sframe = "1"
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
314 slen = int(hit.findtext("Hit_len"))
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
315 values.extend([sallseqid,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
316 hsp.findtext("Hsp_score"), # score,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
317 nident,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
318 positive,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
319 hsp.findtext("Hsp_gaps"), # gaps,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
320 ppos,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
321 qframe,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
322 sframe,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
323 # NOTE - for blastp, XML shows original seq, tabular uses XXX masking
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
324 q_seq,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
325 h_seq,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
326 str(qlen),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
327 str(slen),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
328 salltitles,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
329 ])
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
330 if cols:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
331 # Only a subset of the columns are needed
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
332 values = [values[colnames.index(c)] for c in cols]
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
333 # print("\t".join(values))
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
334 output_handle.write("\t".join(values) + "\n")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
335 # prevents ElementTree from growing large datastructure
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
336 root.clear()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
337 elem.clear()
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
338
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
339
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
340 if options.output:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
341 outfile = open(options.output, "w")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
342 else:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
343 outfile = sys.stdout
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
344
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
345 for in_file in args:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
346 blast_program = None
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
347 convert(in_file, outfile)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
348
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
349 if options.output:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
350 outfile.close()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
351 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
352 # Using stdout
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
353 pass