Mercurial > repos > konradpaszkiewicz > interproscan

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.txt	Wed Jun 05 13:40:56 2013 -0400
@@ -0,0 +1,23 @@
+#Created 07/01/2011 - Konrad Paszkiewicz, Exeter Sequencing Service, University of Exeter, UK
+Revisions 2013 by Peter Cock, The James Hutton Institute, UK
+
+The attached is a crude wrapper script for Interproscan. Typically this is useful when one wants to produce an annotation which is not based on sequence
+similarity. E.g after a denovo transcriptome assembly, each transcript could be translated and run through this tool.
+
+Prerequisites:
+
+1. A working installation of Interproscan on your Galaxy server/cluster.
+
+Limitations:
+
+Currently it is setup to work with PFAM only due to the heavy computational demands Interproscan makes.
+
+Input formats:
+
+The standard interproscan input is either genomic or protein sequences. In the case of genomic sequences Interproscan will of run an ORF
+prediction tool. However this tends to lose the ORF information (e.g. start/end co-ordinates) from the header. As such the requirement here is to input ORF
+sequences (e.g. from EMBOSS getorf) and to then replace any spaces in the FASTA header with underscores. This workaround generally preserves the relevant
+positional information.
+
+
+
--- a/README_INTERPROSCAN	Tue Jun 07 17:27:16 2011 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,22 +0,0 @@
-#Created 07/01/2011 - Konrad Paszkiewicz, Exeter Sequencing Service, University of Exeter
-
-The attached is a crude wrapper script for Interproscan. Typically this is useful when one wants to produce an annotation which is not based on sequence
-similarity. E.g after a denovo transcriptome assembly, each transcript could be translated and run through this tool.
-
-Prerequisites:
-
-1. A working installation of Interproscan on your Galaxy server/cluster.
-
-Limitations:
-
-Currently it is setup to work with PFAM only due to the heavy computational demands Interproscan makes.
-
-Input formats:
-
-The standard interproscan input is either genomic or protein sequences. In the case of genomic sequences Interproscan will of run an ORF
-prediction tool. However this tends to lose the ORF information (e.g. start/end co-ordinates) from the header. As such the requirement here is to input ORF
-sequences (e.g. from EMBOSS getorf) and to then replace any spaces in the FASTA header with underscores. This workaround generally preserves the relevant
-positional information.
-
-
-
--- a/interproscan.xml	Tue Jun 07 17:27:16 2011 -0400
+++ b/interproscan.xml	Wed Jun 05 13:40:56 2013 -0400
@@ -1,4 +1,4 @@
-<tool id="interproscan" name="Interproscan functional predictions of ORFs"  version="1.0.0">
+<tool id="interproscan" name="Interproscan functional predictions of ORFs"  version="1.0.1">
 	<description>Interproscan functional predictions of ORFs</description>
 	<command interpreter="python">
 	  interproscan.py
@@ -18,57 +18,61 @@
 	<requirements>
 	</requirements>
 	<help>
-**Interproscan **
+
+**Interproscan**

 Interproscan is a batch tool to query the Interpro database. It provides annotations based on multiple searches of profile and other functional databases. These include SCOP, CATH, PFAM and SUPERFAMILY. Currently due to resource limitations, only the PFAM database is searched however.

 **Input**
-A FASTA file containing ORF predictions is required. This file must NOT contain any spaces in the FASTA headers - any spaces will be convereted to underscores (_) by this tool before submission to Interproscan.
+A FASTA file containing ORF predictions is required. This file must NOT contain any spaces in the FASTA headers - any spaces will be convereted to underscores by this tool before submission to Interproscan.

 **Output**
-The output will consist of a file in Interproscan raw format@

-This is a basic tab delimited format useful for uploading the data into a relational database or concatenation of different runs.
-is all on one line.
-
-Example here (with descriptions):
-NF00181542      0A5FDCE74AB7C3AD        272     HMMPIR  PIRSF001424     Prephenate dehydratase  1       270     6.5e-141        T       06-Aug-2005\
-        IPR008237       Prephenate dehydratase with ACT region  Molecular Function:prephenate dehydratase activity (GO:0004664), Biological Process\
-        :L-phenylalanine biosynthesis (GO:0009094)
-
-Key:
+The output will consist of a file in Interproscan raw format, a tabular file in galaxy with 14 columns.
+This can be use to upload the data into a relational database or concatenation of different runs.

-NF00181542 is the id of the input sequence.
-27A9BBAC0587AB84 is the crc64 (checksum) of the protein sequence (supposed to be unique).
-272 is the length of the sequence (in AA).
-HMMPIR is the anaysis method launched.
-PIRSF001424 is the database members entry for this match.
-Prephenate dehydratase is the database member description for the entry.
-1 is the start of the domain match.
-270 is the end of the domain match.
-6.5e-141 is the evalue of the match (reported by member database method).
-T is the status of the match (T: true, ?: unknown).
-06-Aug-2005 is the date of the run.
-IPR008237 is the corresponding InterPro entry (if iprlookup requested by the user).
-Prephenate dehydratase with ACT region is the description of the InterPro entry.
-Molecular Function:prephenate dehydratase activity (GO:0004664) is the GO (gene ontology) description for the InterPro entry.
-
+====== ============================================================================================================================= ===========================================
+Column Example                                                                                                                       Description
+------ ----------------------------------------------------------------------------------------------------------------------------- -------------------------------------------
+    c1 NF00181542                                                                                                                    Identifier of the input sequence
+    c2 0A5FDCE74AB7C3AD                                                                                                              crc64 checksum of the protein sequence
+    c3 272                                                                                                                           Length of sequence (in amino acids)
+    c4 HMMPIR                                                                                                                        Analysis metho launched
+    c5 PIRSF001424                                                                                                                   Database members entry for match
+    c6 Prephenate dehydratase                                                                                                        Description from the database
+    c7 1                                                                                                                             Start of the domain match
+    c8 270                                                                                                                           End of the domain match
+    c9 6.5e-141                                                                                                                      e-value (reported by the database method)
+   c10 T                                                                                                                             Status of match (Tfor true, ? forunknown)
+   c11 06-Aug-2005                                                                                                                   Date of the run
+   c12 IPR008237                                                                                                                     InterPro entry (if iprlookup requested)
+   c13 Prephenate dehydratase with ACT region                                                                                        Description of the InterPro entry
+   c14 Molecular Function:prephenate dehydratase activity (GO:0004664), Biological Process:L-phenylalanine biosynthesis (GO:0009094) GO (gene ontology) description
+====== ============================================================================================================================= ===========================================
+
 **Database updates**

 Typically these take place 2-3 times a year.

 **References**

-Quevillon E., Silventoinen V., Pillai S., Harte N., Mulder N., Apweiler R., Lopez R.
-InterProScan: protein domains identifier (2005).
-Nucleic Acids Res. 33 (Web Server issue) :W116-W120
-
+Zdobnov EM, Apweiler R (2001)
+InterProScan an integration platform for the signature-recognition methods in InterPro.
+Bioinformatics 17, 847-848.
+http://dx.doi.org/10.1093/bioinformatics/17.9.847

-Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C.
-InterPro: the integrative protein signature database (2009).
-Nucleic Acids Res. 37 (Database Issue) :D224-228
+Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R (2005)
+InterProScan: protein domains identifier.
+Nucleic Acids Research 33 (Web Server issue), W116-W120.
+http://dx.doi.org/10.1093/nar/gki442

+Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. (2009)
+InterPro: the integrative protein signature database.
+Nucleic Acids Research 37 (Database Issue), D224-228.
+http://dx.doi.org/10.1093/nar/gkn785

+This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at
+http://toolshed.g2.bx.psu.edu/view/konradpaszkiewicz/interproscan

 	</help>
 </tool>