|
0
|
1 <tool id="interproscan" name="Interproscan functional predictions of ORFs" version="1.1">
|
|
|
2 <description>Interproscan functional predictions of ORFs</description>
|
|
|
3 <command>
|
|
|
4 ## The command is a Cheetah template which allows some Python based syntax.
|
|
|
5 ## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
|
|
|
6
|
|
|
7 ## create temp directory
|
|
|
8 #import tempfile, os
|
|
|
9 #set $tfile = tempfile.mkstemp()[1]
|
|
|
10
|
|
|
11 sed 's/ /_/g' $input > $tfile;
|
|
|
12
|
|
|
13 ## Hack, because interproscan does not seem to produce gff output even if it is configured
|
|
|
14 #if str($oformat)=="gff":
|
|
|
15 #set $tfile2 = tempfile.mkstemp()[1]
|
|
|
16 /home/katrien/iprscan/bin/iprscan -cli -nocrc -i $tfile -o $tfile2 -goterms -iprlookup -seqtype p -altjobs -format raw -appl $appl > /dev/null 2> /dev/null;
|
|
|
17 /home/katrien/iprscan/bin/converter.pl -format gff3 -input $tfile2 -output $output
|
|
|
18 rm $tfile2
|
|
|
19 #else
|
|
|
20 /home/katrien/iprscan/bin/iprscan -cli -nocrc -i $tfile -o $output -goterms -iprlookup -seqtype p -altjobs -format $oformat
|
|
|
21 #if not isinstance( $appl.value, list ):
|
|
|
22 #set $args = [ $appl.value ]
|
|
|
23 #else:
|
|
|
24 #set $args = $appl.value
|
|
|
25 #end if
|
|
|
26 ## loop through the applications which were checked by the user
|
|
|
27 #for $application in $args:
|
|
|
28 -appl $application
|
|
|
29 #end for
|
|
|
30 > /dev/null 2> /dev/null
|
|
|
31 #end if
|
|
|
32
|
|
|
33
|
|
|
34 rm $tfile
|
|
|
35
|
|
|
36 </command>
|
|
|
37 <inputs>
|
|
|
38 <param name="input" type="data" format="fasta" label="Protein Fasta File"/>
|
|
|
39 <param name="appl" type="select" multiple="True" display="checkboxes" label="Applications to run ..." help="Select your program.">
|
|
|
40 <option value="seg">seg</option>
|
|
|
41 <option value="profilescan">profilescan</option>
|
|
|
42 <option value="fprintscan">fprintscan</option>
|
|
|
43 <option value="patternscan">patternscan</option>
|
|
|
44 <option value="superfamily">superfamily</option>
|
|
|
45 <option value="hmmpir">hmmpir</option>
|
|
|
46 <option value="hmmpfam">hmmpfam</option>
|
|
|
47 <option value="hmmsmart">hmmsmart</option>
|
|
|
48 <option value="hmmtigr">hmmtigr</option>
|
|
|
49 <option value="hmmpanther">hmmpanther</option>
|
|
|
50 <option value="hamap">hamap</option>
|
|
|
51 <option value="gene3d">gene3d</option>
|
|
|
52 <option value="coils">coils</option>
|
|
|
53 <option value="blastprodom">blastprodom</option>
|
|
|
54 </param>
|
|
|
55 <param name="oformat" type="select" label="Output format" help="Please select a output format.">
|
|
|
56 <option value="gff">gff</option>
|
|
|
57 <option value="raw">raw</option>
|
|
|
58 <option value="txt">txt</option>
|
|
|
59 <option value="html">html</option>
|
|
|
60 <option value="xml" selected="true">xml</option>
|
|
|
61 <option value="ebixml">EBI header on top of xml</option>
|
|
|
62 </param>
|
|
|
63
|
|
|
64 </inputs>
|
|
|
65 <outputs>
|
|
|
66 <data format="txt" name="output" label="Interproscan calculation on ${on_string}">
|
|
|
67 <change_format>
|
|
|
68 <when input="oformat" value="raw" format="raw"/>
|
|
|
69 <when input="oformat" value="html" format="html"/>
|
|
|
70 <when input="oformat" value="xml" format="xml"/>
|
|
|
71 <when input="oformat" value="ebixml" format="xml"/>
|
|
|
72 <when input="oformat" value="gff" format="gff"/>
|
|
|
73 </change_format>
|
|
|
74 </data>
|
|
|
75
|
|
|
76 </outputs>
|
|
|
77 <requirements>
|
|
|
78 </requirements>
|
|
|
79 <help>
|
|
|
80
|
|
|
81 **What it does**
|
|
|
82
|
|
|
83
|
|
|
84 Interproscan is a batch tool to query the Interpro database. It provides annotations based on multiple searches of profile and other functional databases.
|
|
|
85 These include SCOP, CATH, PFAM and SUPERFAMILY.
|
|
|
86
|
|
|
87 This Galaxy wrapper for InterProScan is based on the version of Bjoern Gruening (http://toolshed.g2.bx.psu.edu/repos/bjoern-gruening/iprscan), but is extended with the possibility to select several applications at once via checkboxes.
|
|
|
88
|
|
|
89 **Input**
|
|
|
90
|
|
|
91 A FASTA file containing ORF predictions is required. This file must NOT contain any spaces in the FASTA headers - any spaces will be convereted to underscores (_) by this tool before submission to Interproscan.
|
|
|
92
|
|
|
93 **Output**
|
|
|
94
|
|
|
95 Example for the raw format.
|
|
|
96 This is a basic tab delimited format useful for uploading the data into a relational database or concatenation of different runs.
|
|
|
97 is all on one line.
|
|
|
98
|
|
|
99 ====== ================================================================ ======================================================================
|
|
|
100 column example description
|
|
|
101 ====== ================================================================ ======================================================================
|
|
|
102 c1 NF00181542 the id of the input sequence.
|
|
|
103 c2 27A9BBAC0587AB84 the crc64 (checksum) of the protein sequence (supposed to be unique).
|
|
|
104 c3 272 the length of the sequence (in AA).
|
|
|
105 c4 HMMPIR the anaysis method launched.
|
|
|
106 c5 PIRSF001424 the database members entry for this match.
|
|
|
107 c6 Prephenate dehydratase the database member description for the entry.
|
|
|
108 c7 1 the start of the domain match.
|
|
|
109 c8 270 the end of the domain match.
|
|
|
110 c9 6.5e-141 the evalue of the match (reported by member database method).
|
|
|
111 c10 T the status of the match (T: true, ?: unknown).
|
|
|
112 c11 06-Aug-2005 the date of the run.
|
|
|
113 c12 IPR008237 the corresponding InterPro entry (if iprlookup requested by the user).
|
|
|
114 c13 Prephenate dehydratase with ACT region the description of the InterPro entry.
|
|
|
115 c14 Molecular Function:prephenate dehydratase activity (GO:0004664) the GO (gene ontology) description for the InterPro entry.
|
|
|
116 ====== ================================================================ ======================================================================
|
|
|
117
|
|
|
118
|
|
|
119 **Database updates**
|
|
|
120
|
|
|
121 Typically these take place 2-3 times a year.
|
|
|
122
|
|
|
123
|
|
|
124 **Tools**
|
|
|
125
|
|
|
126 PROSITE patterns
|
|
|
127
|
|
|
128 ::
|
|
|
129
|
|
|
130 Some biologically significant amino acid patterns can be summarised in
|
|
|
131 the form of regular expressions.
|
|
|
132 ScanRegExp (by Wolfgang.Fleischmann@ebi.ac.uk),
|
|
|
133
|
|
|
134 PROSITE profiles
|
|
|
135
|
|
|
136 ::
|
|
|
137
|
|
|
138 There are a number of protein families as well as functional or
|
|
|
139 structural domains that cannot be detected using patterns due to their extreme
|
|
|
140 sequence divergence, so the use of techniques based on weight matrices
|
|
|
141 (also known as profiles) allows the detection of such proteins or domains.
|
|
|
142 A profile is a table of position-specific amino acid weights and gap costs.
|
|
|
143 The profile structure used in PROSITE is similar to but slightly more general
|
|
|
144 (Bucher P. et al., 1996 [7]) than the one introduced by M. Gribskov and
|
|
|
145 co-workers.
|
|
|
146 pfscan from the Pftools package (by Philipp.Bucher@isrec.unil.ch).
|
|
|
147
|
|
|
148 PRINTS
|
|
|
149
|
|
|
150 ::
|
|
|
151
|
|
|
152 The PRINTS database houses a collection of protein family fingerprints.
|
|
|
153 These are groups of motifs that together are diagnostically more
|
|
|
154 powerful than single motifs by making use of the biological context inherent in a
|
|
|
155 multiple-motif method. The fingerprinting method arose from the need for
|
|
|
156 a reliable technique for detecting members of large, highly divergent
|
|
|
157 protein super-families.
|
|
|
158 FingerPRINTScan (Scordis P. et al., 1999 [8]).
|
|
|
159
|
|
|
160 PFAM
|
|
|
161
|
|
|
162 ::
|
|
|
163
|
|
|
164 Pfam is a database of protein domain families. Pfam contains curated
|
|
|
165 multiple sequence alignments for each family and corresponding hidden
|
|
|
166 Markov models (HMMs) (Eddy S.R., 1998 [9]).
|
|
|
167 Profile hidden Markov models are statistical models of the primary
|
|
|
168 structure consensus of a sequence family. The construction and use
|
|
|
169 of Pfam is tightly tied to the HMMER software package.
|
|
|
170 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
|
|
|
171 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
|
|
|
172
|
|
|
173 PRODOM
|
|
|
174
|
|
|
175 ::
|
|
|
176
|
|
|
177
|
|
|
178 ProDom is a database of protein domain families obtained by automated
|
|
|
179 analysis of the SWISS-PROT and TrEMBL protein sequences. It is useful
|
|
|
180 for analysing the domain arrangements of complex protein families and the
|
|
|
181 homology relationships in modular proteins. ProDom families are built by
|
|
|
182 an automated process based on a recursive use of PSI-BLAST homology
|
|
|
183 searches.
|
|
|
184 ProDomBlast3i.pl (by Emmanuel Courcelle emmanuel.courcelle@toulouse.inra.fr
|
|
|
185 and Yoann Beausse beausse@toulouse.inra.fr)
|
|
|
186 a wrapper on top of the Blast package (Altschul S.F. et al., 1997 [10]).
|
|
|
187
|
|
|
188
|
|
|
189 SMART
|
|
|
190
|
|
|
191 ::
|
|
|
192
|
|
|
193 SMART (a Simple Modular Architecture Research Tool) allows the
|
|
|
194 identification and annotation of genetically mobile domains and the
|
|
|
195 analysis of domain architectures. These domains are extensively
|
|
|
196 annotated with respect to phyletic distributions, functional class, tertiary
|
|
|
197 structures and functionally important residues. SMART alignments are
|
|
|
198 optimised manually and following construction of corresponding hidden Markov models (HMMs).
|
|
|
199 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
|
|
|
200 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
|
|
|
201
|
|
|
202
|
|
|
203 TIGRFAMs
|
|
|
204
|
|
|
205 ::
|
|
|
206
|
|
|
207 TIGRFAMs are a collection of protein families featuring curated multiple
|
|
|
208 sequence alignments, Hidden Markov Models (HMMs) and associated
|
|
|
209 information designed to support the automated functional identification
|
|
|
210 of proteins by sequence homology. Classification by equivalog family
|
|
|
211 (see below), where achievable, complements classification by orthologs,
|
|
|
212 superfamily, domain or motif. It provides the information best suited
|
|
|
213 for automatic assignment of specific functions to proteins from large
|
|
|
214 scale genome sequencing projects.
|
|
|
215 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
|
|
|
216 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
|
|
|
217
|
|
|
218 PIR SuperFamily
|
|
|
219
|
|
|
220 ::
|
|
|
221
|
|
|
222 PIR SuperFamily (PIRSF) is a classification system based on evolutionary
|
|
|
223 relationship of whole proteins.
|
|
|
224 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
|
|
|
225 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
|
|
|
226
|
|
|
227 SUPERFAMILY
|
|
|
228
|
|
|
229 ::
|
|
|
230
|
|
|
231 SUPERFAMILY is a library of profile hidden Markov models that represent
|
|
|
232 all proteins of known structure, based on SCOP.
|
|
|
233 hmmpfam/hmmsearch from the HMMER2.3.2 package (by Sean Eddy,
|
|
|
234 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
|
|
|
235 Optionally, predictions for coiled-coil, signal peptide cleavage sites
|
|
|
236 (SignalP v3) and TM helices (TMHMM v2) are supported (See the FAQs file
|
|
|
237 for details).
|
|
|
238
|
|
|
239
|
|
|
240 GENE3D
|
|
|
241
|
|
|
242 ::
|
|
|
243
|
|
|
244 Gene3D is supplementary to the CATH database. This protein sequence database
|
|
|
245 contains proteins from complete genomes which have been clustered into protein
|
|
|
246 families and annotated with CATH domains, Pfam domains and functional
|
|
|
247 information from KEGG, GO, COG, Affymetrix and STRINGS.
|
|
|
248 hmmpfam from the HMM2.3.2 package (by Sean Eddy,
|
|
|
249 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
|
|
|
250
|
|
|
251
|
|
|
252 PANTHER
|
|
|
253
|
|
|
254 ::
|
|
|
255
|
|
|
256 The PANTHER (Protein ANalysis THrough Evolutionary Relationships)
|
|
|
257 Classification System was designed to classify proteins (and their genes)
|
|
|
258 in order to facilitate high-throughput analysis.
|
|
|
259 hmmsearch from the HMM2.3.2 package (by Sean Eddy,
|
|
|
260 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
|
|
|
261 and blastall from the Blast package (Altschul S.F. et al., 1997 [10]).
|
|
|
262
|
|
|
263
|
|
|
264 **Author and affiliation**
|
|
|
265
|
|
|
266 Katrien Bernaerts and Domantas Motiejunas, 12/07/2012
|
|
|
267
|
|
|
268 Corresponding author: domantas dot motiejunas at cropdesign dot com
|
|
|
269
|
|
|
270
|
|
|
271
|
|
|
272 Affiliation: CropDesign N.V., a BASF Plant Science Company - Technologiepark 3, 9052 Zwijnaarde - Belgium
|
|
|
273
|
|
|
274
|
|
|
275 **Terms of use**
|
|
|
276
|
|
|
277 Galaxy wrapper for InterProScan – sequence annotation tool - Copyright (C) 2012 CropDesign N.V. - this software may be used, copied and redistributed, with or without modification freely, without advance permission, provided that the above Copyright statement is reproduced with each copy.
|
|
|
278 THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE (INCLUDING NEGLIGENCE OR OTHERWISE).
|
|
|
279
|
|
|
280
|
|
|
281 **References**
|
|
|
282
|
|
|
283 Quevillon E., Silventoinen V., Pillai S., Harte N., Mulder N., Apweiler R., Lopez R.
|
|
|
284 InterProScan: protein domains identifier (2005).
|
|
|
285 Nucleic Acids Res. 33 (Web Server issue) :W116-W120
|
|
|
286
|
|
|
287 Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C.
|
|
|
288 InterPro: the integrative protein signature database (2009).
|
|
|
289 Nucleic Acids Res. 37 (Database Issue) :D224-228
|
|
|
290
|
|
|
291 Previous Galaxy wrapper authors:
|
|
|
292
|
|
|
293 * Bjoern Gruening, Pharmaceutical Bioinformatics, University of Freiburg
|
|
|
294 * Konrad Paszkiewicz, Exeter Sequencing Service, University of Exeter
|
|
|
295
|
|
|
296
|
|
|
297
|
|
|
298 </help>
|
|
|
299 </tool>
|