Mercurial > repos > devteam > ncbi_blast_plus
annotate tools/ncbi_blast_plus/blastxml_to_tabular.py @ 24:c877294f8025 draft
Fixed tool_dependencies.xml
author | peterjc |
---|---|
date | Mon, 09 Jul 2018 10:08:16 -0400 |
parents | 31e517610e1f |
children | e25d3acf6e68 |
rev | line source |
---|---|
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
10
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
3
diff
changeset
|
2 """Convert a BLAST XML file to tabular output. |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
3 |
15
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
4 Designed to convert BLAST XML files into tabular BLAST output (either |
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
5 std for standard 12 columns, or ext for the extended 25 columns offered |
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
6 in the Galaxy BLAST+ wrappers). |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
7 |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
8 The 12 columns output are 'qseqid sseqid pident length mismatch gapopen qstart |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
9 qend sstart send evalue bitscore' or 'std' at the BLAST+ command line, which |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
10 mean: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
11 |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
12 ====== ========= ============================================ |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
13 Column NCBI name Description |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
14 ------ --------- -------------------------------------------- |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
15 1 qseqid Query Seq-id (ID of your sequence) |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
16 2 sseqid Subject Seq-id (ID of the database hit) |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
17 3 pident Percentage of identical matches |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
18 4 length Alignment length |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
19 5 mismatch Number of mismatches |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
20 6 gapopen Number of gap openings |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
21 7 qstart Start of alignment in query |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
22 8 qend End of alignment in query |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
23 9 sstart Start of alignment in subject (database hit) |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
24 10 send End of alignment in subject (database hit) |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
25 11 evalue Expectation value (E-value) |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
26 12 bitscore Bit score |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
27 ====== ========= ============================================ |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
28 |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
29 The additional columns offered in the Galaxy BLAST+ wrappers are: |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
30 |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
31 ====== ============= =========================================== |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
32 Column NCBI name Description |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
33 ------ ------------- ------------------------------------------- |
11
4c4a0da938ff
Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents:
10
diff
changeset
|
34 13 sallseqid All subject Seq-id(s), separated by ';' |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
35 14 score Raw score |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
36 15 nident Number of identical matches |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
37 16 positive Number of positive-scoring matches |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
38 17 gaps Total number of gaps |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
39 18 ppos Percentage of positive-scoring matches |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
40 19 qframe Query frame |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
41 20 sframe Subject frame |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
42 21 qseq Aligned part of query sequence |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
43 22 sseq Aligned part of subject sequence |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
44 23 qlen Query sequence length |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
45 24 slen Subject sequence length |
11
4c4a0da938ff
Uploaded v0.0.22, now wraps BLAST+ 2.2.28 allowing extended tabular output to include the hit descriptions as column 25.
peterjc
parents:
10
diff
changeset
|
46 25 salltitles All subject titles, separated by '<>' |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
47 ====== ============= =========================================== |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
48 |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
49 Most of these fields are given explicitly in the XML file, others some like |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
50 the percentage identity and the number of gap openings must be calculated. |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
51 |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
52 Be aware that the sequence in the extended tabular output or XML direct from |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
53 BLAST+ may or may not use XXXX masking on regions of low complexity. This |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
54 can throw the off the calculation of percentage identity and gap openings. |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
55 [In fact, both BLAST 2.2.24+ and 2.2.25+ have a subtle bug in this regard, |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
56 with these numbers changing depending on whether or not the low complexity |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
57 filter is used.] |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
58 |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
59 This script attempts to produce identical output to what BLAST+ would have done. |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
60 However, check this with "diff -b ..." since BLAST+ sometimes includes an extra |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
61 space character (probably a bug). |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
62 """ |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
63 |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
64 from __future__ import print_function |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
65 |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
66 import os |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
67 import re |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
68 import sys |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
69 |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
70 from optparse import OptionParser |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
71 |
10
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
3
diff
changeset
|
72 if "-v" in sys.argv or "--version" in sys.argv: |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
73 print("v0.2.01") |
10
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
3
diff
changeset
|
74 sys.exit(0) |
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
3
diff
changeset
|
75 |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
76 if sys.version_info[:2] >= (2, 5): |
10
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
3
diff
changeset
|
77 try: |
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
3
diff
changeset
|
78 from xml.etree import cElementTree as ElementTree |
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
3
diff
changeset
|
79 except ImportError: |
70e7dcbf6573
Uploaded v0.0.20, handles dependencies via package_blast_plus_2_2_26, development moved to GitHub, RST README, MIT licence, citation information, more tests, percentage identity option to BLASTN, cElementTree to ElementTree fallback.
peterjc
parents:
3
diff
changeset
|
80 from xml.etree import ElementTree as ElementTree |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
81 else: |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
82 from galaxy import eggs # noqa - ignore flake8 F401 |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
83 import pkg_resources |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
84 pkg_resources.require("elementtree") |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
85 from elementtree import ElementTree |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
86 |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
87 if len(sys.argv) == 4 and sys.argv[3] in ["std", "x22", "ext"]: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
88 # False positive if user really has a BLAST XML file called 'std' or 'ext'... |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
89 sys.exit("""ERROR: The script API has changed, sorry. |
14
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
90 |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
91 Instead of the old style: |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
92 |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
93 $ python blastxml_to_tabular.py input.xml output.tabular std |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
94 |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
95 Please use: |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
96 |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
97 $ python blastxml_to_tabular.py -o output.tabular -c std input.xml |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
98 |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
99 For more information, use: |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
100 |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
101 $ python blastxml_to_tabular.py -h |
2fe07f50a41e
Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
peterjc
parents:
13
diff
changeset
|
102 """) |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
103 |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
104 usage = """usage: %prog [options] blastxml[,...] |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
105 |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
106 Convert one (or more) BLAST XML files into a single tabular file. |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
107 |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
108 The columns option can be 'std' (standard 12 columns), 'ext' |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
109 (extended 25 columns), or a list of BLAST+ column names like |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
110 'qseqid,sseqid,pident' (space or comma separated). |
23 | 111 |
112 Note if using a list of column names, currently ONLY the 25 | |
113 extended column names are supported. | |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
114 """ |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
115 parser = OptionParser(usage=usage) |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
116 parser.add_option('-o', '--output', dest='output', default=None, |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
117 help='output filename (defaults to stdout)', |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
118 metavar="FILE") |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
119 parser.add_option("-c", "--columns", dest="columns", default='std', |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
120 help="[std|ext|col1,col2,...] standard 12 columns, extended 25 columns, or list of column names") |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
121 (options, args) = parser.parse_args() |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
122 |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
123 colnames = ('qseqid,sseqid,pident,length,mismatch,gapopen,qstart,qend,' |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
124 'sstart,send,evalue,bitscore,sallseqid,score,nident,positive,' |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
125 'gaps,ppos,qframe,sframe,qseq,sseq,qlen,slen,salltitles').split(',') |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
126 |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
127 if len(args) < 1: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
128 sys.exit("ERROR: No BLASTXML input files given; run with --help to see options.") |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
129 |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
130 out_fmt = options.columns |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
131 if out_fmt == "std": |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
132 extended = False |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
133 cols = None |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
134 elif out_fmt == "x22": |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
135 sys.exit("Format argument x22 has been replaced with ext (extended 25 columns)") |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
136 elif out_fmt == "ext": |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
137 extended = True |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
138 cols = None |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
139 else: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
140 cols = out_fmt.replace(" ", ",").split(",") # Allow space or comma separated |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
141 # Remove any blank entries due to trailing comma, |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
142 # or annoying "None" dummy value from Galaxy if no columns |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
143 cols = [c for c in cols if c and c != "None"] |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
144 extra = set(cols).difference(colnames) |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
145 if extra: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
146 sys.exit("These are not recognised column names: %s" % ",".join(sorted(extra))) |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
147 del extra |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
148 assert set(colnames).issuperset(cols), cols |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
149 if not cols: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
150 sys.exit("No columns selected!") |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
151 extended = max(colnames.index(c) for c in cols) >= 12 # Do we need any higher columns? |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
152 del out_fmt |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
153 |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
154 for in_file in args: |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
155 if not os.path.isfile(in_file): |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
156 sys.exit("Input BLAST XML file not found: %s" % in_file) |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
157 |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
158 |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
159 re_default_query_id = re.compile("^Query_\d+$") |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
160 assert re_default_query_id.match("Query_101") |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
161 assert not re_default_query_id.match("Query_101a") |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
162 assert not re_default_query_id.match("MyQuery_101") |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
163 re_default_subject_id = re.compile("^Subject_\d+$") |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
164 assert re_default_subject_id.match("Subject_1") |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
165 assert not re_default_subject_id.match("Subject_") |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
166 assert not re_default_subject_id.match("Subject_12a") |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
167 assert not re_default_subject_id.match("TheSubject_1") |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
168 |
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
169 |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
170 def convert(blastxml_filename, output_handle): |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
171 blast_program = None |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
172 # get an iterable |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
173 try: |
15
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
174 context = ElementTree.iterparse(blastxml_filename, events=("start", "end")) |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
175 except Exception: |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
176 sys.exit("Invalid data format.") |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
177 # turn it into an iterator |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
178 context = iter(context) |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
179 # get the root element |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
180 try: |
24 | 181 event, root = next(context) |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
182 except Exception: |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
183 sys.exit("Invalid data format.") |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
184 for event, elem in context: |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
185 if event == "end" and elem.tag == "BlastOutput_program": |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
186 blast_program = elem.text |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
187 # for every <Iteration> tag |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
188 if event == "end" and elem.tag == "Iteration": |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
189 # Expecting either this, from BLAST 2.2.25+ using FASTA vs FASTA |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
190 # <Iteration_query-ID>sp|Q9BS26|ERP44_HUMAN</Iteration_query-ID> |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
191 # <Iteration_query-def>Endoplasmic reticulum resident protein 44 |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
192 # OS=Homo sapiens GN=ERP44 PE=1 SV=1</Iteration_query-def> |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
193 # <Iteration_query-len>406</Iteration_query-len> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
194 # <Iteration_hits></Iteration_hits> |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
195 # |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
196 # Or, from BLAST 2.2.24+ run online |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
197 # <Iteration_query-ID>Query_1</Iteration_query-ID> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
198 # <Iteration_query-def>Sample</Iteration_query-def> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
199 # <Iteration_query-len>516</Iteration_query-len> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
200 # <Iteration_hits>... |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
201 qseqid = elem.findtext("Iteration_query-ID") |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
202 if re_default_query_id.match(qseqid): |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
203 # Place holder ID, take the first word of the query definition |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
204 qseqid = elem.findtext("Iteration_query-def").split(None, 1)[0] |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
205 qlen = int(elem.findtext("Iteration_query-len")) |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
206 |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
207 # for every <Hit> within <Iteration> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
208 for hit in elem.findall("Iteration_hits/Hit"): |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
209 # Expecting either this, |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
210 # <Hit_id>gi|3024260|sp|P56514.1|OPSD_BUFBU</Hit_id> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
211 # <Hit_def>RecName: Full=Rhodopsin</Hit_def> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
212 # <Hit_accession>P56514</Hit_accession> |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
213 # or, |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
214 # <Hit_id>Subject_1</Hit_id> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
215 # <Hit_def>gi|57163783|ref|NP_001009242.1| rhodopsin [Felis catus]</Hit_def> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
216 # <Hit_accession>Subject_1</Hit_accession> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
217 # |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
218 # apparently depending on the parse_deflines switch |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
219 # |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
220 # Or, with a local database not using -parse_seqids can get this, |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
221 # <Hit_id>gnl|BL_ORD_ID|2</Hit_id> |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
222 # <Hit_def>chrIII gi|240255695|ref|NC_003074.8| Arabidopsis |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
223 # thaliana chromosome 3, complete sequence</Hit_def> |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
224 # <Hit_accession>2</Hit_accession> |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
225 sseqid = hit.findtext("Hit_id").split(None, 1)[0] |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
226 hit_def = sseqid + " " + hit.findtext("Hit_def") |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
227 if re_default_subject_id.match(sseqid) and sseqid == hit.findtext("Hit_accession"): |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
228 # Place holder ID, take the first word of the subject definition |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
229 hit_def = hit.findtext("Hit_def") |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
230 sseqid = hit_def.split(None, 1)[0] |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
231 if sseqid.startswith("gnl|BL_ORD_ID|") and sseqid == "gnl|BL_ORD_ID|" + hit.findtext("Hit_accession"): |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
232 # Alternative place holder ID, again take the first word of hit_def |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
233 hit_def = hit.findtext("Hit_def") |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
234 sseqid = hit_def.split(None, 1)[0] |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
235 # for every <Hsp> within <Hit> |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
236 for hsp in hit.findall("Hit_hsps/Hsp"): |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
237 nident = hsp.findtext("Hsp_identity") |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
238 length = hsp.findtext("Hsp_align-len") |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
239 # As of NCBI BLAST+ 2.4.0 this is given to 3dp (not 2dp) |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
240 pident = "%0.3f" % (100 * float(nident) / float(length)) |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
241 |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
242 q_seq = hsp.findtext("Hsp_qseq") |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
243 h_seq = hsp.findtext("Hsp_hseq") |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
244 m_seq = hsp.findtext("Hsp_midline") |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
245 assert len(q_seq) == len(h_seq) == len(m_seq) == int(length) |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
246 gapopen = str(len(q_seq.replace('-', ' ').split()) - 1 + |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
247 len(h_seq.replace('-', ' ').split()) - 1) |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
248 |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
249 mismatch = m_seq.count(' ') + m_seq.count('+') - q_seq.count('-') - h_seq.count('-') |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
250 # TODO - Remove this alternative mismatch calculation and test |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
251 # once satisifed there are no problems |
21
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
252 expected_mismatch = len(q_seq) - sum(1 for q, h in zip(q_seq, h_seq) |
7538e2bfcd41
v0.2.00, for NCBI BLAST+ 2.5.0 via bioconda or tool_dependencies.xml
peterjc
parents:
20
diff
changeset
|
253 if q == h or q == "-" or h == "-") |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
254 xx = sum(1 for q, h in zip(q_seq, h_seq) if q == "X" and h == "X") |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
255 if not (expected_mismatch - q_seq.count("X") <= int(mismatch) <= expected_mismatch + xx): |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
256 sys.exit("%s vs %s mismatches, expected %i <= %i <= %i" |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
257 % (qseqid, sseqid, expected_mismatch - q_seq.count("X"), |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
258 int(mismatch), expected_mismatch)) |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
259 |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
260 # TODO - Remove this alternative identity calculation and test |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
261 # once satisifed there are no problems |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
262 expected_identity = sum(1 for q, h in zip(q_seq, h_seq) if q == h) |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
263 if not (expected_identity - xx <= int(nident) <= expected_identity + q_seq.count("X")): |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
264 sys.exit("%s vs %s identities, expected %i <= %i <= %i" |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
265 % (qseqid, sseqid, expected_identity, int(nident), |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
266 expected_identity + q_seq.count("X"))) |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
267 |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
268 evalue = hsp.findtext("Hsp_evalue") |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
269 if evalue == "0": |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
270 evalue = "0.0" |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
271 else: |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
272 evalue = "%0.0e" % float(evalue) |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
273 |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
274 bitscore = float(hsp.findtext("Hsp_bit-score")) |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
275 if bitscore < 100: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
276 # Seems to show one decimal place for lower scores |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
277 bitscore = "%0.1f" % bitscore |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
278 else: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
279 # Note BLAST does not round to nearest int, it truncates |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
280 bitscore = "%i" % bitscore |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
281 |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
282 values = [qseqid, |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
283 sseqid, |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
284 pident, |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
285 length, # hsp.findtext("Hsp_align-len") |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
286 str(mismatch), |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
287 gapopen, |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
288 hsp.findtext("Hsp_query-from"), # qstart, |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
289 hsp.findtext("Hsp_query-to"), # qend, |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
290 hsp.findtext("Hsp_hit-from"), # sstart, |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
291 hsp.findtext("Hsp_hit-to"), # send, |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
292 evalue, # hsp.findtext("Hsp_evalue") in scientific notation |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
293 bitscore, # hsp.findtext("Hsp_bit-score") rounded |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
294 ] |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
295 |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
296 if extended: |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
297 try: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
298 sallseqid = ";".join(name.split(None, 1)[0] for name in hit_def.split(" >")) |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
299 salltitles = "<>".join(name.split(None, 1)[1] for name in hit_def.split(" >")) |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
300 except IndexError as e: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
301 sys.exit("Problem splitting multuple hits?\n%r\n--> %s" % (hit_def, e)) |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
302 # print(hit_def, "-->", sallseqid) |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
303 positive = hsp.findtext("Hsp_positive") |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
304 ppos = "%0.2f" % (100 * float(positive) / float(length)) |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
305 qframe = hsp.findtext("Hsp_query-frame") |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
306 sframe = hsp.findtext("Hsp_hit-frame") |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
307 if blast_program == "blastp": |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
308 # Probably a bug in BLASTP that they use 0 or 1 depending on format |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
309 if qframe == "0": |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
310 qframe = "1" |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
311 if sframe == "0": |
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
312 sframe = "1" |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
313 slen = int(hit.findtext("Hit_len")) |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
314 values.extend([sallseqid, |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
315 hsp.findtext("Hsp_score"), # score, |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
316 nident, |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
317 positive, |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
318 hsp.findtext("Hsp_gaps"), # gaps, |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
319 ppos, |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
320 qframe, |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
321 sframe, |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
322 # NOTE - for blastp, XML shows original seq, tabular uses XXX masking |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
323 q_seq, |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
324 h_seq, |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
325 str(qlen), |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
326 str(slen), |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
327 salltitles, |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
328 ]) |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
329 if cols: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
330 # Only a subset of the columns are needed |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
331 values = [values[colnames.index(c)] for c in cols] |
22
6f386c5dc4fb
v0.2.01 add -max_hsps, -use_sw_tback; lists args; internal updates
peterjc
parents:
21
diff
changeset
|
332 # print("\t".join(values)) |
15
c16c30e9ad5b
Uploaded v0.1.03 (internal changes); v0.1.02 (BLAST+ 2.2.30 etc)
peterjc
parents:
14
diff
changeset
|
333 output_handle.write("\t".join(values) + "\n") |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
334 # prevents ElementTree from growing large datastructure |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
335 root.clear() |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
336 elem.clear() |
3
643338ac83c0
Uploaded v0.0.12b, same code but moving folders around to match all my other tools and make future development simpler.
peterjc
parents:
diff
changeset
|
337 |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
338 |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
339 if options.output: |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
340 outfile = open(options.output, "w") |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
341 else: |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
342 outfile = sys.stdout |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
343 |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
344 for in_file in args: |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
345 blast_program = None |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
346 convert(in_file, outfile) |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
347 |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
348 if options.output: |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
349 outfile.close() |
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
350 else: |
20
3034ce97dd33
Uploaded v0.1.08, can search multiple local databases, fixes a pipe problem in blastdbcmd, and minor internal changes.
peterjc
parents:
15
diff
changeset
|
351 # Using stdout |
13
623f727cdff1
Uploaded v0.1.00, uses BLAST+ 2.2.29, allows custom column selection for tabular output - including taxonomy fields.
peterjc
parents:
11
diff
changeset
|
352 pass |