comparison interpro/paso3.xml @ 0:c342ebb50f0b draft default tip

Uploaded
author fernando
date Thu, 22 May 2014 05:09:07 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:c342ebb50f0b
1 <tool id="CLaGiFer_3" name="Sequences attributes" version="1.0.0">
2 <description>Download gff file from InterPro</description>
3 <command interpreter="bash">
4 ./paso3.sh "$infile" "$outfile"
5 </command>
6
7 <inputs>
8 <param name="infile" type="data" format="fasta" label="Fasta file"/>
9 </inputs>
10 <outputs>
11 <data format="gff" name="outfile"/>
12 </outputs>
13
14 <stdio><exit_code range="1:" level="fatal" description="Error" /></stdio>
15 <help>
16
17
18 **What it does**
19
20 Interproscan is a batch tool to query the Interpro database. It provides annotations based on multiple searches of profile and other functional databases.
21
22
23 **Dependencies**
24
25 InterProscan package is required to be installed (http://code.google.com/p/interproscan/wiki/HowToDownload).
26
27
28
29 #####
30 Input
31 #####
32
33 A FASTA file containing protein sequences is required.
34
35
36 ######
37 Output
38 ######
39
40 Generic Feature Format Version 3 (GFF3)
41
42 The GFF3 format is a flat tab-delimited file, which is much richer then the TSV output format. It allows you to trace back from matches to predicted proteins and to nucleic acid sequences. It also contains a FASTA format representation of the predicted protein sequences and their matches. You will find a documentation of all the columns and attributes used on [http://www.sequenceontology.org/gff3.shtml].
43
44 Example Output
45 --------------
46
47 ::
48
49 ##gff-version 3
50 ##feature-ontology http://song.cvs.sourceforge.net/viewvc/song/ontology/sofa.obo?revision=1.269
51 ##sequence-region AACH01000027 1 1347
52 ##seqid|source|type|start|end|score|strand|phase|attributes
53 AACH01000027 provided_by_user nucleic_acid 1 1347 . + . Name=AACH01000027;md5=b2a7416cb92565c004becb7510f46840;ID=AACH01000027
54 AACH01000027 getorf ORF 1 1347 . + . Name=AACH01000027.2_21;Target=pep_AACH01000027_1_1347 1 449;md5=b2a7416cb92565c004becb7510f46840;ID=orf_AACH01000027_1_1347
55 AACH01000027 getorf polypeptide 1 449 . + . md5=fd0743a673ac69fb6e5c67a48f264dd5;ID=pep_AACH01000027_1_1347
56 AACH01000027 Pfam protein_match 84 314 1.2E-45 + . Name=PF00696;signature_desc=Amino acid kinase family;Target=null 84 314;status=T;ID=match$8_84_314;Ontology_term="GO:0008652";date=15-04-2013;Dbxref="InterPro:IPR001048","Reactome:REACT_13"
57 ##sequence-region 2
58 ...
59 >pep_AACH01000027_1_1347
60 LVLLAAFDCIDDTKLVKQIIISEIINSLPNIVNDKYGRKVLLYLLSPRDPAHTVREIIEV
61 LQKGDGNAHSKKDTEIRRREMKYKRIVFKVGTSSLTNEDGSLSRSKVKDITQQLAMLHEA
62 GHELILVSSGAIAAGFGALGFKKRPTKIADKQASAAVGQGLLLEEYTTNLLLRQIVSAQI
63 LLTQDDFVDKRRYKNAHQALSVLLNRGAIPIINENDSVVIDELKVGDNDTLSAQVAAMVQ
64 ADLLVFLTDVDGLYTGNPNSDPRAKRLERIETINREIIDMAGGAGSSNGTGGMLTKIKAA
65 TIATESGVPVYICSSLKSDSMIEAAEETEDGSYFVAQEKGLRTQKQWLAFYAQSQGSIWV
66 DKGAAEALSQYGKSLLLSGIVEAEGVFSYGDIVTVFDKESGKSLGKGRVQFGASALEDML
67 RSQKAKGVLIYRDDWISITPEIQLLFTEF
68 ...
69 >match$8_84_314
70 KRIVFKVGTSSLTNEDGSLSRSKVKDITQQLAMLHEAGHELILVSSGAIAAGFGALGFKK
71 RPTKIADKQASAAVGQGLLLEEYTTNLLLRQIVSAQILLTQDDFVDKRRYKNAHQALSVL
72 LNRGAIPIINENDSVVIDELKVGDNDTLSAQVAAMVQADLLVFLTDVDGLYTGNPNSDPR
73 AKRLERIETINREIIDMAGGAGSSNGTGGMLTKIKAATIATESGVPVYICS
74
75
76
77 ----------
78 References
79 ----------
80
81
82 If you use this Galaxy tool in work leading to a scientific publication please
83 cite the following papers:
84
85 Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
86 Galaxy tools and workflows for sequence analysis with applications
87 in molecular plant pathology. PeerJ 1:e167
88 http://dx.doi.org/10.7717/peerj.167
89
90 Zdobnov EM, Apweiler R (2001)
91 InterProScan an integration platform for the signature-recognition methods in InterPro.
92 Bioinformatics 17, 847-848.
93 http://dx.doi.org/10.1093/bioinformatics/17.9.847
94
95 Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R (2005)
96 InterProScan: protein domains identifier.
97 Nucleic Acids Research 33 (Web Server issue), W116-W120.
98 http://dx.doi.org/10.1093/nar/gki442
99
100 Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. (2009)
101 InterPro: the integrative protein signature database.
102 Nucleic Acids Research 37 (Database Issue), D224-228.
103 http://dx.doi.org/10.1093/nar/gkn785
104
105
106 This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at
107 http://toolshed.g2.bx.psu.edu/view/bgruening/interproscan5
108
109
110 **Galaxy Wrapper Author**::
111
112 * Fernando Pérez
113 * Ginés Almagro
114 * Laura Entrambasaguas
115 </help>
116 </tool>