comparison iprscan/interproscan.xml @ 5:a3f43bb03458 draft default tip

Uploaded
author basfplant
date Tue, 05 Mar 2013 03:58:40 -0500
parents cbb57809b7b5
children
comparison
equal deleted inserted replaced
4:4e717784f03c 5:a3f43bb03458
1 <tool id="interproscan" name="Interproscan functional predictions of ORFs" version="1.1">
2 <description>Interproscan functional predictions of ORFs</description>
3 <command>
4 ## The command is a Cheetah template which allows some Python based syntax.
5 ## Lines starting hash hash are comments. Galaxy will turn newlines into spaces
6
7 ## create temp directory
8 #import tempfile, os
9 #set $tfile = tempfile.mkstemp()[1]
10
11 sed 's/ /_/g' $input > $tfile;
12
13 ## Hack, because interproscan does not seem to produce gff output even if it is configured
14 #if str($oformat)=="gff":
15 #set $tfile2 = tempfile.mkstemp()[1]
16 /home/katrien/iprscan/bin/iprscan -cli -nocrc -i $tfile -o $tfile2 -goterms -iprlookup -seqtype p -altjobs -format raw -appl $appl > /dev/null 2> /dev/null;
17 /home/katrien/iprscan/bin/converter.pl -format gff3 -input $tfile2 -output $output
18 rm $tfile2
19 #else
20 /home/katrien/iprscan/bin/iprscan -cli -nocrc -i $tfile -o $output -goterms -iprlookup -seqtype p -altjobs -format $oformat
21 #if not isinstance( $appl.value, list ):
22 #set $args = [ $appl.value ]
23 #else:
24 #set $args = $appl.value
25 #end if
26 ## loop through the applications which were checked by the user
27 #for $application in $args:
28 -appl $application
29 #end for
30 > /dev/null 2> /dev/null
31 #end if
32
33
34 rm $tfile
35
36 </command>
37 <inputs>
38 <param name="input" type="data" format="fasta" label="Protein Fasta File"/>
39 <param name="appl" type="select" multiple="True" display="checkboxes" label="Applications to run ..." help="Select your program.">
40 <option value="seg">seg</option>
41 <option value="profilescan">profilescan</option>
42 <option value="fprintscan">fprintscan</option>
43 <option value="patternscan">patternscan</option>
44 <option value="superfamily">superfamily</option>
45 <option value="hmmpir">hmmpir</option>
46 <option value="hmmpfam">hmmpfam</option>
47 <option value="hmmsmart">hmmsmart</option>
48 <option value="hmmtigr">hmmtigr</option>
49 <option value="hmmpanther">hmmpanther</option>
50 <option value="hamap">hamap</option>
51 <option value="gene3d">gene3d</option>
52 <option value="coils">coils</option>
53 <option value="blastprodom">blastprodom</option>
54 </param>
55 <param name="oformat" type="select" label="Output format" help="Please select a output format.">
56 <option value="gff">gff</option>
57 <option value="raw">raw</option>
58 <option value="txt">txt</option>
59 <option value="html">html</option>
60 <option value="xml" selected="true">xml</option>
61 <option value="ebixml">EBI header on top of xml</option>
62 </param>
63
64 </inputs>
65 <outputs>
66 <data format="txt" name="output" label="Interproscan calculation on ${on_string}">
67 <change_format>
68 <when input="oformat" value="raw" format="raw"/>
69 <when input="oformat" value="html" format="html"/>
70 <when input="oformat" value="xml" format="xml"/>
71 <when input="oformat" value="ebixml" format="xml"/>
72 <when input="oformat" value="gff" format="gff"/>
73 </change_format>
74 </data>
75
76 </outputs>
77 <requirements>
78 </requirements>
79 <help>
80
81 **What it does**
82
83
84 Interproscan is a batch tool to query the Interpro database. It provides annotations based on multiple searches of profile and other functional databases.
85 These include SCOP, CATH, PFAM and SUPERFAMILY.
86
87 This Galaxy wrapper for InterProScan is based on the version of Bjoern Gruening (http://toolshed.g2.bx.psu.edu/repos/bjoern-gruening/iprscan), but is extended with the possibility to select several applications at once via checkboxes.
88
89 **Input**
90
91 A FASTA file containing ORF predictions is required. This file must NOT contain any spaces in the FASTA headers - any spaces will be convereted to underscores (_) by this tool before submission to Interproscan.
92
93 **Output**
94
95 Example for the raw format.
96 This is a basic tab delimited format useful for uploading the data into a relational database or concatenation of different runs.
97 is all on one line.
98
99 ====== ================================================================ ======================================================================
100 column example description
101 ====== ================================================================ ======================================================================
102 c1 NF00181542 the id of the input sequence.
103 c2 27A9BBAC0587AB84 the crc64 (checksum) of the protein sequence (supposed to be unique).
104 c3 272 the length of the sequence (in AA).
105 c4 HMMPIR the anaysis method launched.
106 c5 PIRSF001424 the database members entry for this match.
107 c6 Prephenate dehydratase the database member description for the entry.
108 c7 1 the start of the domain match.
109 c8 270 the end of the domain match.
110 c9 6.5e-141 the evalue of the match (reported by member database method).
111 c10 T the status of the match (T: true, ?: unknown).
112 c11 06-Aug-2005 the date of the run.
113 c12 IPR008237 the corresponding InterPro entry (if iprlookup requested by the user).
114 c13 Prephenate dehydratase with ACT region the description of the InterPro entry.
115 c14 Molecular Function:prephenate dehydratase activity (GO:0004664) the GO (gene ontology) description for the InterPro entry.
116 ====== ================================================================ ======================================================================
117
118
119 **Database updates**
120
121 Typically these take place 2-3 times a year.
122
123
124 **Tools**
125
126 PROSITE patterns
127
128 ::
129
130 Some biologically significant amino acid patterns can be summarised in
131 the form of regular expressions.
132 ScanRegExp (by Wolfgang.Fleischmann@ebi.ac.uk),
133
134 PROSITE profiles
135
136 ::
137
138 There are a number of protein families as well as functional or
139 structural domains that cannot be detected using patterns due to their extreme
140 sequence divergence, so the use of techniques based on weight matrices
141 (also known as profiles) allows the detection of such proteins or domains.
142 A profile is a table of position-specific amino acid weights and gap costs.
143 The profile structure used in PROSITE is similar to but slightly more general
144 (Bucher P. et al., 1996 [7]) than the one introduced by M. Gribskov and
145 co-workers.
146 pfscan from the Pftools package (by Philipp.Bucher@isrec.unil.ch).
147
148 PRINTS
149
150 ::
151
152 The PRINTS database houses a collection of protein family fingerprints.
153 These are groups of motifs that together are diagnostically more
154 powerful than single motifs by making use of the biological context inherent in a
155 multiple-motif method. The fingerprinting method arose from the need for
156 a reliable technique for detecting members of large, highly divergent
157 protein super-families.
158 FingerPRINTScan (Scordis P. et al., 1999 [8]).
159
160 PFAM
161
162 ::
163
164 Pfam is a database of protein domain families. Pfam contains curated
165 multiple sequence alignments for each family and corresponding hidden
166 Markov models (HMMs) (Eddy S.R., 1998 [9]).
167 Profile hidden Markov models are statistical models of the primary
168 structure consensus of a sequence family. The construction and use
169 of Pfam is tightly tied to the HMMER software package.
170 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
171 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
172
173 PRODOM
174
175 ::
176
177
178 ProDom is a database of protein domain families obtained by automated
179 analysis of the SWISS-PROT and TrEMBL protein sequences. It is useful
180 for analysing the domain arrangements of complex protein families and the
181 homology relationships in modular proteins. ProDom families are built by
182 an automated process based on a recursive use of PSI-BLAST homology
183 searches.
184 ProDomBlast3i.pl (by Emmanuel Courcelle emmanuel.courcelle@toulouse.inra.fr
185 and Yoann Beausse beausse@toulouse.inra.fr)
186 a wrapper on top of the Blast package (Altschul S.F. et al., 1997 [10]).
187
188
189 SMART
190
191 ::
192
193 SMART (a Simple Modular Architecture Research Tool) allows the
194 identification and annotation of genetically mobile domains and the
195 analysis of domain architectures. These domains are extensively
196 annotated with respect to phyletic distributions, functional class, tertiary
197 structures and functionally important residues. SMART alignments are
198 optimised manually and following construction of corresponding hidden Markov models (HMMs).
199 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
200 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
201
202
203 TIGRFAMs
204
205 ::
206
207 TIGRFAMs are a collection of protein families featuring curated multiple
208 sequence alignments, Hidden Markov Models (HMMs) and associated
209 information designed to support the automated functional identification
210 of proteins by sequence homology. Classification by equivalog family
211 (see below), where achievable, complements classification by orthologs,
212 superfamily, domain or motif. It provides the information best suited
213 for automatic assignment of specific functions to proteins from large
214 scale genome sequencing projects.
215 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
216 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
217
218 PIR SuperFamily
219
220 ::
221
222 PIR SuperFamily (PIRSF) is a classification system based on evolutionary
223 relationship of whole proteins.
224 hmmpfam from the HMMER2.3.2 package (by Sean Eddy,
225 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
226
227 SUPERFAMILY
228
229 ::
230
231 SUPERFAMILY is a library of profile hidden Markov models that represent
232 all proteins of known structure, based on SCOP.
233 hmmpfam/hmmsearch from the HMMER2.3.2 package (by Sean Eddy,
234 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
235 Optionally, predictions for coiled-coil, signal peptide cleavage sites
236 (SignalP v3) and TM helices (TMHMM v2) are supported (See the FAQs file
237 for details).
238
239
240 GENE3D
241
242 ::
243
244 Gene3D is supplementary to the CATH database. This protein sequence database
245 contains proteins from complete genomes which have been clustered into protein
246 families and annotated with CATH domains, Pfam domains and functional
247 information from KEGG, GO, COG, Affymetrix and STRINGS.
248 hmmpfam from the HMM2.3.2 package (by Sean Eddy,
249 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
250
251
252 PANTHER
253
254 ::
255
256 The PANTHER (Protein ANalysis THrough Evolutionary Relationships)
257 Classification System was designed to classify proteins (and their genes)
258 in order to facilitate high-throughput analysis.
259 hmmsearch from the HMM2.3.2 package (by Sean Eddy,
260 eddy@genetics.wustl.edu, http://hmmer.wustl.edu).
261 and blastall from the Blast package (Altschul S.F. et al., 1997 [10]).
262
263
264 **Author and affiliation**
265
266 Katrien Bernaerts and Domantas Motiejunas, 12/07/2012
267
268 Corresponding author: domantas dot motiejunas at cropdesign dot com
269
270
271
272 Affiliation: CropDesign N.V., a BASF Plant Science Company - Technologiepark 3, 9052 Zwijnaarde - Belgium
273
274
275 **Terms of use**
276
277 Galaxy wrapper for InterProScan – sequence annotation tool - Copyright (C) 2012 CropDesign N.V. - this software may be used, copied and redistributed, with or without modification freely, without advance permission, provided that the above Copyright statement is reproduced with each copy.
278 THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE (INCLUDING NEGLIGENCE OR OTHERWISE).
279
280
281 **References**
282
283 Quevillon E., Silventoinen V., Pillai S., Harte N., Mulder N., Apweiler R., Lopez R.
284 InterProScan: protein domains identifier (2005).
285 Nucleic Acids Res. 33 (Web Server issue) :W116-W120
286
287 Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C.
288 InterPro: the integrative protein signature database (2009).
289 Nucleic Acids Res. 37 (Database Issue) :D224-228
290
291 Previous Galaxy wrapper authors:
292
293 * Bjoern Gruening, Pharmaceutical Bioinformatics, University of Freiburg
294 * Konrad Paszkiewicz, Exeter Sequencing Service, University of Exeter
295
296
297
298 </help>
299 </tool>