Mercurial > repos > devteam > ncbi_blast_plus
annotate tools/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml @ 9:9dabbfd73c8a draft
Uploaded v0.0.19, adds wrappers for rpsblast and rpstblastn with new blastdb_d.loc file for their protein domain database.
Also includes other minor improvements.
author | peterjc |
---|---|
date | Thu, 25 Apr 2013 09:38:37 -0400 |
parents | 393a7a35383c |
children | 70e7dcbf6573 |
rev | line source |
---|---|
9
9dabbfd73c8a
Uploaded v0.0.19, adds wrappers for rpsblast and rpstblastn with new blastdb_d.loc file for their protein domain database.
peterjc
parents:
5
diff
changeset
|
1 <tool id="ncbi_blastdbcmd_wrapper" name="NCBI BLAST+ blastdbcmd entry(s)" version="0.0.5"> |
5 | 2 <description>Extract sequence(s) from BLAST database</description> |
9
9dabbfd73c8a
Uploaded v0.0.19, adds wrappers for rpsblast and rpstblastn with new blastdb_d.loc file for their protein domain database.
peterjc
parents:
5
diff
changeset
|
3 <requirements> |
9dabbfd73c8a
Uploaded v0.0.19, adds wrappers for rpsblast and rpstblastn with new blastdb_d.loc file for their protein domain database.
peterjc
parents:
5
diff
changeset
|
4 <requirement type="binary">blastdbcmd</requirement> |
9dabbfd73c8a
Uploaded v0.0.19, adds wrappers for rpsblast and rpstblastn with new blastdb_d.loc file for their protein domain database.
peterjc
parents:
5
diff
changeset
|
5 <requirement type="package" version="2.2.26+">blast+</requirement> |
9dabbfd73c8a
Uploaded v0.0.19, adds wrappers for rpsblast and rpstblastn with new blastdb_d.loc file for their protein domain database.
peterjc
parents:
5
diff
changeset
|
6 </requirements> |
9dabbfd73c8a
Uploaded v0.0.19, adds wrappers for rpsblast and rpstblastn with new blastdb_d.loc file for their protein domain database.
peterjc
parents:
5
diff
changeset
|
7 <version_command>blastdbcmd -version</version_command> |
5 | 8 <command> |
9 ## The command is a Cheetah template which allows some Python based syntax. | |
10 ## Lines starting hash hash are comments. Galaxy will turn newlines into spaces | |
11 blastdbcmd -dbtype $db_opts.db_type -db "${db_opts.database.fields.path}" | |
12 | |
13 ##TODO: What about -ctrl_a and -target_only as advanced options? | |
14 | |
15 #if $id_opts.id_type=="file": | |
16 -entry_batch "$id_opts.entries" | |
17 #else: | |
18 ##Perform some simple search/replaces to remove whitespace | |
19 ##and make it comma separated, and escape any pipe characters | |
20 -entry "$id_opts.entries.replace('\r',',').replace('\n',',').replace(' ','').replace(',,',',').replace(',,',',').strip(',').replace('|','\|')" | |
21 #end if | |
22 | |
23 ##When building a BLAST database, to ensure unique IDs makeblastdb will | |
24 ##do things like turning a FASTA entry with ID of ERP44 into lcl|ERP44 | |
25 ##(if using -parse_seqids) or simply assign it an ID using the record | |
26 ##number like gnl|BL_ORD_ID|123 (to cope with duplicate IDs in the FASTA | |
27 ##file). In -parse_seqids mode, a duplicate FASTA ID gives an error. | |
28 ## | |
29 ##The BLAST plain text and XML output will contain these BLAST IDs, but | |
30 ##the tabular output does not (at least, not in BLAST 2.2.25+). | |
31 ##Therefore in general, Galaxy users won't care about the (internal) | |
32 ##BLAST identifiers. | |
33 ## | |
34 ##The blastdbcmd FASTA output will also contain these IDs, but in the | |
35 ##context of the BLAST tabular output they are not helpful. Therefore | |
36 ##to recover the original ID as used in the FASTA file for makeblastdb | |
37 ##we need a litte post processing. | |
38 ## | |
39 ##We remove the NCBI's lcl|... or gnl|BL_ORD_ID|123 prefixes | |
40 ##using sed, however the exact syntax differs for Mac OS X's sed | |
41 | |
42 #if str($outfmt)=="blastid": | |
43 -out "$seq" | |
44 #else if sys.platform == "darwin": | |
45 | sed -E 's/^>(lcl\||gnl\|BL_ORD_ID\|[0-9]* )/>/1' > "$seq" | |
46 #else: | |
47 | sed 's/>\(lcl|\|gnl|BL_ORD_ID|[0-9]* \)/>/1' > "$seq" | |
48 #end if | |
49 </command> | |
50 <stdio> | |
51 <!-- Anything other than zero is an error --> | |
52 <exit_code range="1:" /> | |
53 <exit_code range=":-1" /> | |
54 <!-- Suspect blastdbcmd sometimes fails to set error level --> | |
55 <regex match="Error:" /> | |
9
9dabbfd73c8a
Uploaded v0.0.19, adds wrappers for rpsblast and rpstblastn with new blastdb_d.loc file for their protein domain database.
peterjc
parents:
5
diff
changeset
|
56 <regex match="Exception:" /> |
5 | 57 </stdio> |
58 <inputs> | |
59 <conditional name="db_opts"> | |
60 <param name="db_type" type="select" label="Type of BLAST database"> | |
61 <option value="nucl" selected="True">Nucleotide</option> | |
62 <option value="prot">Protein</option> | |
63 </param> | |
64 <when value="nucl"> | |
65 <param name="database" type="select" label="Nucleotide BLAST database"> | |
66 <options from_file="blastdb.loc"> | |
67 <column name="value" index="0"/> | |
68 <column name="name" index="1"/> | |
69 <column name="path" index="2"/> | |
70 </options> | |
71 </param> | |
72 </when> | |
73 <when value="prot"> | |
74 <param name="database" type="select" label="Protein BLAST database"> | |
75 <options from_file="blastdb_p.loc"> | |
76 <column name="value" index="0"/> | |
77 <column name="name" index="1"/> | |
78 <column name="path" index="2"/> | |
79 </options> | |
80 </param> | |
81 </when> | |
82 </conditional> | |
83 <conditional name="id_opts"> | |
84 <param name="id_type" type="select" label="Type of identifier list"> | |
85 <option value="file">From file</option> | |
86 <option value="prompt">User entered</option> | |
87 </param> | |
88 <when value="file"> | |
89 <param name="entries" type="data" format="txt,tabular" label="Sequence identifier(s)" help="Plain text file with one ID per line (i.e. single column tabular file)"/> | |
90 </when> | |
91 <when value="prompt"> | |
92 <param name="entries" type="text" label="Sequence identifier(s)" help="Comma or new line separated list." optional="False" area="True" size="10x30"/> | |
93 </when> | |
94 </conditional> | |
95 <param name="outfmt" type="select" label="Output format"> | |
96 <option value="original">FASTA with original identifiers</option> | |
97 <option value="blastid">FASTA with BLAST assigned identifiers</option> | |
98 </param> | |
99 </inputs> | |
100 <outputs> | |
101 <data name="seq" format="fasta" label="Sequences from ${db_opts.database.fields.name}" /> | |
102 </outputs> | |
103 <help> | |
104 | |
105 **What it does** | |
106 | |
107 Extracts FASTA formatted sequences from a BLAST database | |
108 using the NCBI BLAST+ blastdbcmd command line tool. | |
109 | |
110 .. class:: warningmark | |
111 | |
112 **BLAST assigned identifiers** | |
113 | |
114 When a BLAST database is constructed from a FASTA file, the | |
115 original identifiers can be replaced with BLAST assigned | |
116 identifiers, partly to ensure uniqueness. e.g. Sometimes | |
117 a prefix of 'lcl|' is added (lcl is short for local), | |
118 or an arbitrary name starting 'gnl|BL_ORD_ID|' is created. | |
119 | |
120 If you are using the tabular output from BLAST, it will contain | |
121 the original identifiers - not the BLAST assigned identifiers | |
122 suitable for use with the blastdbcmd tool. | |
123 | |
124 If you are using the XML or plain text output, this will also | |
125 contain the BLAST assigned identifiers. However, this means | |
126 getting a list of BLAST assigned identifiers isn't straightforward. | |
127 | |
128 ------- | |
129 | |
130 **References** | |
131 | |
132 Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. Nucleic Acids Res. 25:3389-3402. | |
133 | |
134 Schaffer et al. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. 2001. Nucleic Acids Res. 29:2994-3005. | |
135 | |
136 </help> | |
137 </tool> |