view interpro/paso3.xml @ 0:c342ebb50f0b draft default tip

Uploaded
author fernando
date Thu, 22 May 2014 05:09:07 -0400
parents
children
line wrap: on
line source

<tool id="CLaGiFer_3" name="Sequences attributes" version="1.0.0">
    <description>Download gff file from InterPro</description>
    <command interpreter="bash">
    ./paso3.sh "$infile" "$outfile"
    </command>

        <inputs>
            <param name="infile" type="data" format="fasta" label="Fasta file"/>
      </inputs>
<outputs>
<data format="gff" name="outfile"/>
</outputs>

<stdio><exit_code range="1:" level="fatal" description="Error" /></stdio>
    <help>


**What it does**

Interproscan is a batch tool to query the Interpro database. It provides annotations based on multiple searches of profile and other functional databases.


**Dependencies**

InterProscan package is required to be installed (http://code.google.com/p/interproscan/wiki/HowToDownload).



#####
Input
#####

A FASTA file containing protein sequences is required.


######
Output
######

Generic Feature Format Version 3 (GFF3)

The GFF3 format is a flat tab-delimited file, which is much richer then the TSV output format. It allows you to trace back from matches to predicted proteins and to nucleic acid sequences. It also contains a FASTA format representation of the predicted protein sequences and their matches. You will find a documentation of all the columns and attributes used on [http://www.sequenceontology.org/gff3.shtml].

Example Output
--------------

::

  ##gff-version 3
  ##feature-ontology http://song.cvs.sourceforge.net/viewvc/song/ontology/sofa.obo?revision=1.269
  ##sequence-region AACH01000027 1 1347
  ##seqid|source|type|start|end|score|strand|phase|attributes
  AACH01000027 provided_by_user nucleic_acid 1 1347 . + . Name=AACH01000027;md5=b2a7416cb92565c004becb7510f46840;ID=AACH01000027
  AACH01000027 getorf ORF 1 1347 . + . Name=AACH01000027.2_21;Target=pep_AACH01000027_1_1347 1 449;md5=b2a7416cb92565c004becb7510f46840;ID=orf_AACH01000027_1_1347
  AACH01000027 getorf polypeptide 1 449 . + . md5=fd0743a673ac69fb6e5c67a48f264dd5;ID=pep_AACH01000027_1_1347
  AACH01000027 Pfam protein_match 84 314 1.2E-45 + . Name=PF00696;signature_desc=Amino acid kinase family;Target=null 84 314;status=T;ID=match$8_84_314;Ontology_term="GO:0008652";date=15-04-2013;Dbxref="InterPro:IPR001048","Reactome:REACT_13"
  ##sequence-region 2
  ...
  >pep_AACH01000027_1_1347
  LVLLAAFDCIDDTKLVKQIIISEIINSLPNIVNDKYGRKVLLYLLSPRDPAHTVREIIEV
  LQKGDGNAHSKKDTEIRRREMKYKRIVFKVGTSSLTNEDGSLSRSKVKDITQQLAMLHEA
  GHELILVSSGAIAAGFGALGFKKRPTKIADKQASAAVGQGLLLEEYTTNLLLRQIVSAQI
  LLTQDDFVDKRRYKNAHQALSVLLNRGAIPIINENDSVVIDELKVGDNDTLSAQVAAMVQ
  ADLLVFLTDVDGLYTGNPNSDPRAKRLERIETINREIIDMAGGAGSSNGTGGMLTKIKAA
  TIATESGVPVYICSSLKSDSMIEAAEETEDGSYFVAQEKGLRTQKQWLAFYAQSQGSIWV
  DKGAAEALSQYGKSLLLSGIVEAEGVFSYGDIVTVFDKESGKSLGKGRVQFGASALEDML
  RSQKAKGVLIYRDDWISITPEIQLLFTEF
  ...
  >match$8_84_314
  KRIVFKVGTSSLTNEDGSLSRSKVKDITQQLAMLHEAGHELILVSSGAIAAGFGALGFKK
  RPTKIADKQASAAVGQGLLLEEYTTNLLLRQIVSAQILLTQDDFVDKRRYKNAHQALSVL
  LNRGAIPIINENDSVVIDELKVGDNDTLSAQVAAMVQADLLVFLTDVDGLYTGNPNSDPR
  AKRLERIETINREIIDMAGGAGSSNGTGGMLTKIKAATIATESGVPVYICS
  


----------
References
----------


If you use this Galaxy tool in work leading to a scientific publication please
cite the following papers:

Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
Galaxy tools and workflows for sequence analysis with applications
in molecular plant pathology. PeerJ 1:e167
http://dx.doi.org/10.7717/peerj.167

Zdobnov EM, Apweiler R (2001)
InterProScan an integration platform for the signature-recognition methods in InterPro.
Bioinformatics 17, 847-848.
http://dx.doi.org/10.1093/bioinformatics/17.9.847

Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R (2005)
InterProScan: protein domains identifier.
Nucleic Acids Research 33 (Web Server issue), W116-W120.
http://dx.doi.org/10.1093/nar/gki442

Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. (2009)
InterPro: the integrative protein signature database.
Nucleic Acids Research 37 (Database Issue), D224-228.
http://dx.doi.org/10.1093/nar/gkn785


This wrapper is available to install into other Galaxy Instances via the Galaxy Tool Shed at
http://toolshed.g2.bx.psu.edu/view/bgruening/interproscan5


**Galaxy Wrapper Author**::

    * Fernando Pérez
    * Ginés Almagro
    * Laura Entrambasaguas
    </help>
</tool>