annotate tools/ncbi_blast_plus/blastxml_to_tabular.py @ 24:c877294f8025 draft

Fixed tool_dependencies.xml
author peterjc
date Mon, 09 Jul 2018 10:08:16 -0400
parents 31e517610e1f
children e25d3acf6e68
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
2 """Convert a BLAST XML file to tabular output.
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
3
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
4 Designed to convert BLAST XML files into tabular BLAST output (either
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
5 std for standard 12 columns, or ext for the extended 25 columns offered
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
6 in the Galaxy BLAST+ wrappers).
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
7
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
8 The 12 columns output are 'qseqid sseqid pident length mismatch gapopen qstart
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
9 qend sstart send evalue bitscore' or 'std' at the BLAST+ command line, which
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
10 mean:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
11
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
12 ====== ========= ============================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
13 Column NCBI name Description
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
14 ------ --------- --------------------------------------------
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
15 1 qseqid Query Seq-id (ID of your sequence)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
16 2 sseqid Subject Seq-id (ID of the database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
17 3 pident Percentage of identical matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
18 4 length Alignment length
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
19 5 mismatch Number of mismatches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
20 6 gapopen Number of gap openings
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
21 7 qstart Start of alignment in query
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
22 8 qend End of alignment in query
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
23 9 sstart Start of alignment in subject (database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
24 10 send End of alignment in subject (database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
25 11 evalue Expectation value (E-value)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
26 12 bitscore Bit score
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
27 ====== ========= ============================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
28
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
29 The additional columns offered in the Galaxy BLAST+ wrappers are:
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
30
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
31 ====== ============= ===========================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
32 Column NCBI name Description
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
33 ------ ------------- -------------------------------------------
11
4c4a0da938ff Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents: 10
diff changeset
34 13 sallseqid All subject Seq-id(s), separated by ';'
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
35 14 score Raw score
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
36 15 nident Number of identical matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
37 16 positive Number of positive-scoring matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
38 17 gaps Total number of gaps
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
39 18 ppos Percentage of positive-scoring matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
40 19 qframe Query frame
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
41 20 sframe Subject frame
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
42 21 qseq Aligned part of query sequence
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
43 22 sseq Aligned part of subject sequence
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
44 23 qlen Query sequence length
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
45 24 slen Subject sequence length
11
4c4a0da938ff Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents: 10
diff changeset
46 25 salltitles All subject titles, separated by '<>'
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
47 ====== ============= ===========================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
48
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
49 Most of these fields are given explicitly in the XML file, others some like
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
50 the percentage identity and the number of gap openings must be calculated.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
51
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
52 Be aware that the sequence in the extended tabular output or XML direct from
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
53 BLAST+ may or may not use XXXX masking on regions of low complexity. This
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
54 can throw the off the calculation of percentage identity and gap openings.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
55 [In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard,
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
56 with these numbers changing depending on whether or not the low complexity
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
57 filter is used.]
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
58
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
59 This script attempts to produce identical output to what BLAST+ would have done.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
60 However, check this with "diff -b ..." since BLAST+ sometimes includes an extra
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
61 space character (probably a bug).
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
62 """
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
63
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
64 from __future__ import print_function
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
65
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
66 import os
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
67 import re
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
68 import sys
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
69
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
70 from optparse import OptionParser
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
71
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
72 if "-v" in sys.argv or "--version" in sys.argv:
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
73 print("v0.2.01")
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
74 sys.exit(0)
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
75
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
76 if sys.version_info[:2] >= (2, 5):
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
77 try:
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
78 from xml.etree import cElementTree as ElementTree
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
79 except ImportError:
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
80 from xml.etree import ElementTree as ElementTree
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
81 else:
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
82 from galaxy import eggs # noqa - ignore flake8 F401
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
83 import pkg_resources
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
84 pkg_resources.require("elementtree")
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
85 from elementtree import ElementTree
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
86
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
87 if len(sys.argv) == 4 and sys.argv[3] in ["std", "x22", "ext"]:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
88 # False positive if user really has a BLAST XML file called 'std' or 'ext'...
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
89 sys.exit("""ERROR: The script API has changed, sorry.
14
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
90
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
91 Instead of the old style:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
92
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
93 $ python blastxml_to_tabular.py input.xml output.tabular std
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
94
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
95 Please use:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
96
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
97 $ python blastxml_to_tabular.py -o output.tabular -c std input.xml
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
98
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
99 For more information, use:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
100
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
101 $ python blastxml_to_tabular.py -h
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
102 """)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
103
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
104 usage = """usage: %prog [options] blastxml[,...]
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
105
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
106 Convert one (or more) BLAST XML files into a single tabular file.
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
107
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
108 The columns option can be 'std' (standard 12 columns), 'ext'
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
109 (extended 25 columns), or a list of BLAST+ column names like
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
110 'qseqid,sseqid,pident' (space or comma separated).
23
31e517610e1f v0.3.0 Updated for NCBI BLAST+ 2.7.1
peterjc
parents: 22
diff changeset
111
31e517610e1f v0.3.0 Updated for NCBI BLAST+ 2.7.1
peterjc
parents: 22
diff changeset
112 Note if using a list of column names, currently ONLY the 25
31e517610e1f v0.3.0 Updated for NCBI BLAST+ 2.7.1
peterjc
parents: 22
diff changeset
113 extended column names are supported.
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
114 """
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
115 parser = OptionParser(usage=usage)
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
116 parser.add_option('-o', '--output', dest='output', default=None,
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
117 help='output filename (defaults to stdout)',
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
118 metavar="FILE")
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
119 parser.add_option("-c", "--columns", dest="columns", default='std',
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
120 help="[std|ext|col1,col2,...] standard 12 columns, extended 25 columns, or list of column names")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
121 (options, args) = parser.parse_args()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
122
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
123 colnames = ('qseqid,sseqid,pident,length,mismatch,gapopen,qstart,qend,'
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
124 'sstart,send,evalue,bitscore,sallseqid,score,nident,positive,'
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
125 'gaps,ppos,qframe,sframe,qseq,sseq,qlen,slen,salltitles').split(',')
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
126
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
127 if len(args) < 1:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
128 sys.exit("ERROR: No BLASTXML input files given; run with --help to see options.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
129
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
130 out_fmt = options.columns
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
131 if out_fmt == "std":
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
132 extended = False
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
133 cols = None
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
134 elif out_fmt == "x22":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
135 sys.exit("Format argument x22 has been replaced with ext (extended 25 columns)")
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
136 elif out_fmt == "ext":
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
137 extended = True
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
138 cols = None
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
139 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
140 cols = out_fmt.replace(" ", ",").split(",") # Allow space or comma separated
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
141 # Remove any blank entries due to trailing comma,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
142 # or annoying "None" dummy value from Galaxy if no columns
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
143 cols = [c for c in cols if c and c != "None"]
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
144 extra = set(cols).difference(colnames)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
145 if extra:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
146 sys.exit("These are not recognised column names: %s" % ",".join(sorted(extra)))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
147 del extra
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
148 assert set(colnames).issuperset(cols), cols
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
149 if not cols:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
150 sys.exit("No columns selected!")
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
151 extended = max(colnames.index(c) for c in cols) >= 12 # Do we need any higher columns?
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
152 del out_fmt
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
153
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
154 for in_file in args:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
155 if not os.path.isfile(in_file):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
156 sys.exit("Input BLAST XML file not found: %s" % in_file)
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
157
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
158
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
159 re_default_query_id = re.compile("^Query_\d+$")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
160 assert re_default_query_id.match("Query_101")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
161 assert not re_default_query_id.match("Query_101a")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
162 assert not re_default_query_id.match("MyQuery_101")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
163 re_default_subject_id = re.compile("^Subject_\d+$")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
164 assert re_default_subject_id.match("Subject_1")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
165 assert not re_default_subject_id.match("Subject_")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
166 assert not re_default_subject_id.match("Subject_12a")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
167 assert not re_default_subject_id.match("TheSubject_1")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
168
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
169
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
170 def convert(blastxml_filename, output_handle):
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
171 blast_program = None
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
172 # get an iterable
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
173 try:
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
174 context = ElementTree.iterparse(blastxml_filename, events=("start", "end"))
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
175 except Exception:
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
176 sys.exit("Invalid data format.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
177 # turn it into an iterator
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
178 context = iter(context)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
179 # get the root element
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
180 try:
24
c877294f8025 Fixed tool_dependencies.xml
peterjc
parents: 23
diff changeset
181 event, root = next(context)
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
182 except Exception:
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
183 sys.exit("Invalid data format.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
184 for event, elem in context:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
185 if event == "end" and elem.tag == "BlastOutput_program":
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
186 blast_program = elem.text
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
187 # for every <Iteration> tag
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
188 if event == "end" and elem.tag == "Iteration":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
189 # Expecting either this, from BLAST 2.2.25+ using FASTA vs FASTA
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
190 # <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
191 # <Iteration_query-def>Endoplasmic reticulum resident protein 44
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
192 # OS=Homo sapiens GN=ERP44 PE=1 SV=1</Iteration_query-def>
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
193 # <Iteration_query-len>406</Iteration_query-len>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
194 # <Iteration_hits></Iteration_hits>
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
195 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
196 # Or, from BLAST 2.2.24+ run online
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
197 # <Iteration_query-ID>Query_1</Iteration_query-ID>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
198 # <Iteration_query-def>Sample</Iteration_query-def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
199 # <Iteration_query-len>516</Iteration_query-len>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
200 # <Iteration_hits>...
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
201 qseqid = elem.findtext("Iteration_query-ID")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
202 if re_default_query_id.match(qseqid):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
203 # Place holder ID, take the first word of the query definition
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
204 qseqid = elem.findtext("Iteration_query-def").split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
205 qlen = int(elem.findtext("Iteration_query-len"))
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
206
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
207 # for every <Hit> within <Iteration>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
208 for hit in elem.findall("Iteration_hits/Hit"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
209 # Expecting either this,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
210 # <Hit_id>gi|3024260|sp|P56514.1|OPSD_BUFBU</Hit_id>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
211 # <Hit_def>RecName: Full=Rhodopsin</Hit_def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
212 # <Hit_accession>P56514</Hit_accession>
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
213 # or,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
214 # <Hit_id>Subject_1</Hit_id>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
215 # <Hit_def>gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus]</Hit_def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
216 # <Hit_accession>Subject_1</Hit_accession>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
217 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
218 # apparently depending on the parse_deflines switch
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
219 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
220 # Or, with a local database not using -parse_seqids can get this,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
221 # <Hit_id>gnl|BL_ORD_ID|2</Hit_id>
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
222 # <Hit_def>chrIII gi|240255695|ref|NC_003074.8| Arabidopsis
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
223 # thaliana chromosome 3, complete sequence</Hit_def>
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
224 # <Hit_accession>2</Hit_accession>
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
225 sseqid = hit.findtext("Hit_id").split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
226 hit_def = sseqid + " " + hit.findtext("Hit_def")
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
227 if re_default_subject_id.match(sseqid) and sseqid == hit.findtext("Hit_accession"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
228 # Place holder ID, take the first word of the subject definition
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
229 hit_def = hit.findtext("Hit_def")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
230 sseqid = hit_def.split(None, 1)[0]
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
231 if sseqid.startswith("gnl|BL_ORD_ID|") and sseqid == "gnl|BL_ORD_ID|" + hit.findtext("Hit_accession"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
232 # Alternative place holder ID, again take the first word of hit_def
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
233 hit_def = hit.findtext("Hit_def")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
234 sseqid = hit_def.split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
235 # for every <Hsp> within <Hit>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
236 for hsp in hit.findall("Hit_hsps/Hsp"):
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
237 nident = hsp.findtext("Hsp_identity")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
238 length = hsp.findtext("Hsp_align-len")
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
239 # As of NCBI BLAST+ 2.4.0 this is given to 3dp (not 2dp)
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
240 pident = "%0.3f" % (100 * float(nident) / float(length))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
241
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
242 q_seq = hsp.findtext("Hsp_qseq")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
243 h_seq = hsp.findtext("Hsp_hseq")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
244 m_seq = hsp.findtext("Hsp_midline")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
245 assert len(q_seq) == len(h_seq) == len(m_seq) == int(length)
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
246 gapopen = str(len(q_seq.replace('-', ' ').split()) - 1 +
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
247 len(h_seq.replace('-', ' ').split()) - 1)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
248
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
249 mismatch = m_seq.count(' ') + m_seq.count('+') - q_seq.count('-') - h_seq.count('-')
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
250 # TODO - Remove this alternative mismatch calculation and test
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
251 # once satisifed there are no problems
21
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
252 expected_mismatch = len(q_seq) - sum(1 for q, h in zip(q_seq, h_seq)
7538e2bfcd41 v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents: 20
diff changeset
253 if q == h or q == "-" or h == "-")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
254 xx = sum(1 for q, h in zip(q_seq, h_seq) if q == "X" and h == "X")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
255 if not (expected_mismatch - q_seq.count("X") <= int(mismatch) <= expected_mismatch + xx):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
256 sys.exit("%s vs %s mismatches, expected %i <= %i <= %i"
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
257 % (qseqid, sseqid, expected_mismatch - q_seq.count("X"),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
258 int(mismatch), expected_mismatch))
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
259
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
260 # TODO - Remove this alternative identity calculation and test
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
261 # once satisifed there are no problems
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
262 expected_identity = sum(1 for q, h in zip(q_seq, h_seq) if q == h)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
263 if not (expected_identity - xx <= int(nident) <= expected_identity + q_seq.count("X")):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
264 sys.exit("%s vs %s identities, expected %i <= %i <= %i"
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
265 % (qseqid, sseqid, expected_identity, int(nident),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
266 expected_identity + q_seq.count("X")))
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
267
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
268 evalue = hsp.findtext("Hsp_evalue")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
269 if evalue == "0":
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
270 evalue = "0.0"
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
271 else:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
272 evalue = "%0.0e" % float(evalue)
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
273
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
274 bitscore = float(hsp.findtext("Hsp_bit-score"))
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
275 if bitscore < 100:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
276 # Seems to show one decimal place for lower scores
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
277 bitscore = "%0.1f" % bitscore
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
278 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
279 # Note BLAST does not round to nearest int, it truncates
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
280 bitscore = "%i" % bitscore
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
281
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
282 values = [qseqid,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
283 sseqid,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
284 pident,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
285 length, # hsp.findtext("Hsp_align-len")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
286 str(mismatch),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
287 gapopen,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
288 hsp.findtext("Hsp_query-from"), # qstart,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
289 hsp.findtext("Hsp_query-to"), # qend,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
290 hsp.findtext("Hsp_hit-from"), # sstart,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
291 hsp.findtext("Hsp_hit-to"), # send,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
292 evalue, # hsp.findtext("Hsp_evalue") in scientific notation
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
293 bitscore, # hsp.findtext("Hsp_bit-score") rounded
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
294 ]
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
295
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
296 if extended:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
297 try:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
298 sallseqid = ";".join(name.split(None, 1)[0] for name in hit_def.split(" >"))
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
299 salltitles = "<>".join(name.split(None, 1)[1] for name in hit_def.split(" >"))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
300 except IndexError as e:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
301 sys.exit("Problem splitting multuple hits?\n%r\n--> %s" % (hit_def, e))
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
302 # print(hit_def, "-->", sallseqid)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
303 positive = hsp.findtext("Hsp_positive")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
304 ppos = "%0.2f" % (100 * float(positive) / float(length))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
305 qframe = hsp.findtext("Hsp_query-frame")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
306 sframe = hsp.findtext("Hsp_hit-frame")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
307 if blast_program == "blastp":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
308 # Probably a bug in BLASTP that they use 0 or 1 depending on format
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
309 if qframe == "0":
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
310 qframe = "1"
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
311 if sframe == "0":
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
312 sframe = "1"
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
313 slen = int(hit.findtext("Hit_len"))
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
314 values.extend([sallseqid,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
315 hsp.findtext("Hsp_score"), # score,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
316 nident,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
317 positive,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
318 hsp.findtext("Hsp_gaps"), # gaps,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
319 ppos,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
320 qframe,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
321 sframe,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
322 # NOTE - for blastp, XML shows original seq, tabular uses XXX masking
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
323 q_seq,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
324 h_seq,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
325 str(qlen),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
326 str(slen),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
327 salltitles,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
328 ])
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
329 if cols:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
330 # Only a subset of the columns are needed
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
331 values = [values[colnames.index(c)] for c in cols]
22
6f386c5dc4fb v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents: 21
diff changeset
332 # print("\t".join(values))
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
333 output_handle.write("\t".join(values) + "\n")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
334 # prevents ElementTree from growing large datastructure
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
335 root.clear()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
336 elem.clear()
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
337
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
338
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
339 if options.output:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
340 outfile = open(options.output, "w")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
341 else:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
342 outfile = sys.stdout
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
343
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
344 for in_file in args:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
345 blast_program = None
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
346 convert(in_file, outfile)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
347
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
348 if options.output:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
349 outfile.close()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
350 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
351 # Using stdout
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
352 pass