annotate tools/ncbi_blast_plus/blastxml_to_tabular.py @ 20:3034ce97dd33 draft

Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
author peterjc
date Mon, 07 Nov 2016 11:31:37 -0500
parents c16c30e9ad5b
children 7538e2bfcd41
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
2 """Convert a BLAST XML file to tabular output.
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
3
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
4 Designed to convert BLAST XML files into tabular BLAST output (either
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
5 std for standard 12 columns, or ext for the extended 25 columns offered
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
6 in the Galaxy BLAST+ wrappers).
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
7
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
8 The 12 columns output are 'qseqid sseqid pident length mismatch gapopen qstart
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
9 qend sstart send evalue bitscore' or 'std' at the BLAST+ command line, which
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
10 mean:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
11
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
12 ====== ========= ============================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
13 Column NCBI name Description
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
14 ------ --------- --------------------------------------------
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
15 1 qseqid Query Seq-id (ID of your sequence)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
16 2 sseqid Subject Seq-id (ID of the database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
17 3 pident Percentage of identical matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
18 4 length Alignment length
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
19 5 mismatch Number of mismatches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
20 6 gapopen Number of gap openings
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
21 7 qstart Start of alignment in query
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
22 8 qend End of alignment in query
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
23 9 sstart Start of alignment in subject (database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
24 10 send End of alignment in subject (database hit)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
25 11 evalue Expectation value (E-value)
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
26 12 bitscore Bit score
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
27 ====== ========= ============================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
28
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
29 The additional columns offered in the Galaxy BLAST+ wrappers are:
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
30
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
31 ====== ============= ===========================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
32 Column NCBI name Description
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
33 ------ ------------- -------------------------------------------
11
4c4a0da938ff Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents: 10
diff changeset
34 13 sallseqid All subject Seq-id(s), separated by ';'
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
35 14 score Raw score
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
36 15 nident Number of identical matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
37 16 positive Number of positive-scoring matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
38 17 gaps Total number of gaps
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
39 18 ppos Percentage of positive-scoring matches
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
40 19 qframe Query frame
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
41 20 sframe Subject frame
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
42 21 qseq Aligned part of query sequence
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
43 22 sseq Aligned part of subject sequence
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
44 23 qlen Query sequence length
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
45 24 slen Subject sequence length
11
4c4a0da938ff Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents: 10
diff changeset
46 25 salltitles All subject titles, separated by '<>'
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
47 ====== ============= ===========================================
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
48
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
49 Most of these fields are given explicitly in the XML file, others some like
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
50 the percentage identity and the number of gap openings must be calculated.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
51
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
52 Be aware that the sequence in the extended tabular output or XML direct from
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
53 BLAST+ may or may not use XXXX masking on regions of low complexity. This
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
54 can throw the off the calculation of percentage identity and gap openings.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
55 [In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard,
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
56 with these numbers changing depending on whether or not the low complexity
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
57 filter is used.]
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
58
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
59 This script attempts to produce identical output to what BLAST+ would have done.
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
60 However, check this with "diff -b ..." since BLAST+ sometimes includes an extra
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
61 space character (probably a bug).
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
62 """
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
63 import sys
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
64 import re
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
65 import os
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
66 from optparse import OptionParser
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
67
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
68 if "-v" in sys.argv or "--version" in sys.argv:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
69 print "v0.1.08"
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
70 sys.exit(0)
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
71
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
72 if sys.version_info[:2] >= (2, 5):
10
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
73 try:
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
74 from xml.etree import cElementTree as ElementTree
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
75 except ImportError:
70e7dcbf6573 Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents: 3
diff changeset
76 from xml.etree import ElementTree as ElementTree
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
77 else:
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
78 from galaxy import eggs
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
79 import pkg_resources
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
80 pkg_resources.require("elementtree")
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
81 from elementtree import ElementTree
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
82
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
83 if len(sys.argv) == 4 and sys.argv[3] in ["std", "x22", "ext"]:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
84 # False positive if user really has a BLAST XML file called 'std' or 'ext'...
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
85 sys.exit("""ERROR: The script API has changed, sorry.
14
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
86
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
87 Instead of the old style:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
88
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
89 $ python blastxml_to_tabular.py input.xml output.tabular std
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
90
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
91 Please use:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
92
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
93 $ python blastxml_to_tabular.py -o output.tabular -c std input.xml
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
94
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
95 For more information, use:
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
96
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
97 $ python blastxml_to_tabular.py -h
2fe07f50a41e Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents: 13
diff changeset
98 """)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
99
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
100 usage = """usage: %prog [options] blastxml[,...]
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
101
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
102 Convert one (or more) BLAST XML files into a single tabular file.
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
103
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
104 The columns option can be 'std' (standard 12 columns), 'ext'
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
105 (extended 25 columns), or a list of BLAST+ column names like
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
106 'qseqid,sseqid,pident' (space or comma separated).
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
107 """
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
108 parser = OptionParser(usage=usage)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
109 parser.add_option('-o', '--output', dest='output', default=None, help='output filename (defaults to stdout)', metavar="FILE")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
110 parser.add_option("-c", "--columns", dest="columns", default='std', help="[std|ext|col1,col2,...] standard 12 columns, extended 25 columns, or list of column names")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
111 (options, args) = parser.parse_args()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
112
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
113 colnames = 'qseqid,sseqid,pident,length,mismatch,gapopen,qstart,qend,sstart,send,evalue,bitscore,sallseqid,score,nident,positive,gaps,ppos,qframe,sframe,qseq,sseq,qlen,slen,salltitles'.split(',')
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
114
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
115 if len(args) < 1:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
116 sys.exit("ERROR: No BLASTXML input files given; run with --help to see options.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
117
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
118 out_fmt = options.columns
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
119 if out_fmt == "std":
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
120 extended = False
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
121 cols = None
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
122 elif out_fmt == "x22":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
123 sys.exit("Format argument x22 has been replaced with ext (extended 25 columns)")
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
124 elif out_fmt == "ext":
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
125 extended = True
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
126 cols = None
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
127 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
128 cols = out_fmt.replace(" ", ",").split(",") # Allow space or comma separated
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
129 # Remove any blank entries due to trailing comma,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
130 # or annoying "None" dummy value from Galaxy if no columns
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
131 cols = [c for c in cols if c and c != "None"]
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
132 extra = set(cols).difference(colnames)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
133 if extra:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
134 sys.exit("These are not recognised column names: %s" % ",".join(sorted(extra)))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
135 del extra
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
136 assert set(colnames).issuperset(cols), cols
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
137 if not cols:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
138 sys.exit("No columns selected!")
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
139 extended = max(colnames.index(c) for c in cols) >= 12 # Do we need any higher columns?
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
140 del out_fmt
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
141
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
142 for in_file in args:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
143 if not os.path.isfile(in_file):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
144 sys.exit("Input BLAST XML file not found: %s" % in_file)
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
145
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
146
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
147 re_default_query_id = re.compile("^Query_\d+$")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
148 assert re_default_query_id.match("Query_101")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
149 assert not re_default_query_id.match("Query_101a")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
150 assert not re_default_query_id.match("MyQuery_101")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
151 re_default_subject_id = re.compile("^Subject_\d+$")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
152 assert re_default_subject_id.match("Subject_1")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
153 assert not re_default_subject_id.match("Subject_")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
154 assert not re_default_subject_id.match("Subject_12a")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
155 assert not re_default_subject_id.match("TheSubject_1")
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
156
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
157
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
158 def convert(blastxml_filename, output_handle):
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
159 blast_program = None
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
160 # get an iterable
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
161 try:
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
162 context = ElementTree.iterparse(blastxml_filename, events=("start", "end"))
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
163 except Exception:
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
164 sys.exit("Invalid data format.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
165 # turn it into an iterator
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
166 context = iter(context)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
167 # get the root element
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
168 try:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
169 event, root = context.next()
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
170 except Exception:
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
171 sys.exit("Invalid data format.")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
172 for event, elem in context:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
173 if event == "end" and elem.tag == "BlastOutput_program":
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
174 blast_program = elem.text
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
175 # for every <Iteration> tag
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
176 if event == "end" and elem.tag == "Iteration":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
177 # Expecting either this, from BLAST 2.2.25+ using FASTA vs FASTA
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
178 # <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
179 # <Iteration_query-def>Endoplasmic reticulum resident protein 44 OS=Homo sapiens GN=ERP44 PE=1 SV=1</Iteration_query-def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
180 # <Iteration_query-len>406</Iteration_query-len>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
181 # <Iteration_hits></Iteration_hits>
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
182 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
183 # Or, from BLAST 2.2.24+ run online
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
184 # <Iteration_query-ID>Query_1</Iteration_query-ID>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
185 # <Iteration_query-def>Sample</Iteration_query-def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
186 # <Iteration_query-len>516</Iteration_query-len>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
187 # <Iteration_hits>...
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
188 qseqid = elem.findtext("Iteration_query-ID")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
189 if re_default_query_id.match(qseqid):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
190 # Place holder ID, take the first word of the query definition
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
191 qseqid = elem.findtext("Iteration_query-def").split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
192 qlen = int(elem.findtext("Iteration_query-len"))
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
193
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
194 # for every <Hit> within <Iteration>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
195 for hit in elem.findall("Iteration_hits/Hit"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
196 # Expecting either this,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
197 # <Hit_id>gi|3024260|sp|P56514.1|OPSD_BUFBU</Hit_id>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
198 # <Hit_def>RecName: Full=Rhodopsin</Hit_def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
199 # <Hit_accession>P56514</Hit_accession>
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
200 # or,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
201 # <Hit_id>Subject_1</Hit_id>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
202 # <Hit_def>gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus]</Hit_def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
203 # <Hit_accession>Subject_1</Hit_accession>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
204 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
205 # apparently depending on the parse_deflines switch
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
206 #
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
207 # Or, with a local database not using -parse_seqids can get this,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
208 # <Hit_id>gnl|BL_ORD_ID|2</Hit_id>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
209 # <Hit_def>chrIII gi|240255695|ref|NC_003074.8| Arabidopsis thaliana chromosome 3, complete sequence</Hit_def>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
210 # <Hit_accession>2</Hit_accession>
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
211 sseqid = hit.findtext("Hit_id").split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
212 hit_def = sseqid + " " + hit.findtext("Hit_def")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
213 if re_default_subject_id.match(sseqid) \
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
214 and sseqid == hit.findtext("Hit_accession"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
215 # Place holder ID, take the first word of the subject definition
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
216 hit_def = hit.findtext("Hit_def")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
217 sseqid = hit_def.split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
218 if sseqid.startswith("gnl|BL_ORD_ID|") \
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
219 and sseqid == "gnl|BL_ORD_ID|" + hit.findtext("Hit_accession"):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
220 # Alternative place holder ID, again take the first word of hit_def
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
221 hit_def = hit.findtext("Hit_def")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
222 sseqid = hit_def.split(None, 1)[0]
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
223 # for every <Hsp> within <Hit>
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
224 for hsp in hit.findall("Hit_hsps/Hsp"):
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
225 nident = hsp.findtext("Hsp_identity")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
226 length = hsp.findtext("Hsp_align-len")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
227 pident = "%0.2f" % (100 * float(nident) / float(length))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
228
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
229 q_seq = hsp.findtext("Hsp_qseq")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
230 h_seq = hsp.findtext("Hsp_hseq")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
231 m_seq = hsp.findtext("Hsp_midline")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
232 assert len(q_seq) == len(h_seq) == len(m_seq) == int(length)
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
233 gapopen = str(len(q_seq.replace('-', ' ').split()) - 1 +
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
234 len(h_seq.replace('-', ' ').split()) - 1)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
235
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
236 mismatch = m_seq.count(' ') + m_seq.count('+') \
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
237 - q_seq.count('-') - h_seq.count('-')
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
238 # TODO - Remove this alternative mismatch calculation and test
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
239 # once satisifed there are no problems
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
240 expected_mismatch = len(q_seq) \
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
241 - sum(1 for q, h in zip(q_seq, h_seq)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
242 if q == h or q == "-" or h == "-")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
243 xx = sum(1 for q, h in zip(q_seq, h_seq) if q == "X" and h == "X")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
244 if not (expected_mismatch - q_seq.count("X") <= int(mismatch) <= expected_mismatch + xx):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
245 sys.exit("%s vs %s mismatches, expected %i <= %i <= %i"
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
246 % (qseqid, sseqid, expected_mismatch - q_seq.count("X"),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
247 int(mismatch), expected_mismatch))
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
248
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
249 # TODO - Remove this alternative identity calculation and test
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
250 # once satisifed there are no problems
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
251 expected_identity = sum(1 for q, h in zip(q_seq, h_seq) if q == h)
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
252 if not (expected_identity - xx <= int(nident) <= expected_identity + q_seq.count("X")):
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
253 sys.exit("%s vs %s identities, expected %i <= %i <= %i"
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
254 % (qseqid, sseqid, expected_identity, int(nident),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
255 expected_identity + q_seq.count("X")))
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
256
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
257 evalue = hsp.findtext("Hsp_evalue")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
258 if evalue == "0":
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
259 evalue = "0.0"
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
260 else:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
261 evalue = "%0.0e" % float(evalue)
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
262
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
263 bitscore = float(hsp.findtext("Hsp_bit-score"))
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
264 if bitscore < 100:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
265 # Seems to show one decimal place for lower scores
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
266 bitscore = "%0.1f" % bitscore
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
267 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
268 # Note BLAST does not round to nearest int, it truncates
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
269 bitscore = "%i" % bitscore
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
270
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
271 values = [qseqid,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
272 sseqid,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
273 pident,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
274 length, # hsp.findtext("Hsp_align-len")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
275 str(mismatch),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
276 gapopen,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
277 hsp.findtext("Hsp_query-from"), # qstart,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
278 hsp.findtext("Hsp_query-to"), # qend,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
279 hsp.findtext("Hsp_hit-from"), # sstart,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
280 hsp.findtext("Hsp_hit-to"), # send,
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
281 evalue, # hsp.findtext("Hsp_evalue") in scientific notation
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
282 bitscore, # hsp.findtext("Hsp_bit-score") rounded
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
283 ]
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
284
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
285 if extended:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
286 try:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
287 sallseqid = ";".join(name.split(None, 1)[0] for name in hit_def.split(" >"))
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
288 salltitles = "<>".join(name.split(None, 1)[1] for name in hit_def.split(" >"))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
289 except IndexError as e:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
290 sys.exit("Problem splitting multuple hits?\n%r\n--> %s" % (hit_def, e))
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
291 # print hit_def, "-->", sallseqid
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
292 positive = hsp.findtext("Hsp_positive")
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
293 ppos = "%0.2f" % (100 * float(positive) / float(length))
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
294 qframe = hsp.findtext("Hsp_query-frame")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
295 sframe = hsp.findtext("Hsp_hit-frame")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
296 if blast_program == "blastp":
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
297 # Probably a bug in BLASTP that they use 0 or 1 depending on format
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
298 if qframe == "0":
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
299 qframe = "1"
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
300 if sframe == "0":
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
301 sframe = "1"
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
302 slen = int(hit.findtext("Hit_len"))
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
303 values.extend([sallseqid,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
304 hsp.findtext("Hsp_score"), # score,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
305 nident,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
306 positive,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
307 hsp.findtext("Hsp_gaps"), # gaps,
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
308 ppos,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
309 qframe,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
310 sframe,
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
311 # NOTE - for blastp, XML shows original seq, tabular uses XXX masking
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
312 q_seq,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
313 h_seq,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
314 str(qlen),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
315 str(slen),
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
316 salltitles,
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
317 ])
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
318 if cols:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
319 # Only a subset of the columns are needed
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
320 values = [values[colnames.index(c)] for c in cols]
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
321 # print "\t".join(values)
15
c16c30e9ad5b Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents: 14
diff changeset
322 output_handle.write("\t".join(values) + "\n")
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
323 # prevents ElementTree from growing large datastructure
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
324 root.clear()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
325 elem.clear()
3
643338ac83c0 Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff changeset
326
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
327
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
328 if options.output:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
329 outfile = open(options.output, "w")
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
330 else:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
331 outfile = sys.stdout
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
332
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
333 for in_file in args:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
334 blast_program = None
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
335 convert(in_file, outfile)
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
336
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
337 if options.output:
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
338 outfile.close()
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
339 else:
20
3034ce97dd33 Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents: 15
diff changeset
340 # Using stdout
13
623f727cdff1 Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents: 11
diff changeset
341 pass