view tools/protein_analysis/predictnls.xml @ 0:6e26c5a48e9a draft

Uploaded v0.0.4, first public release.
author peterjc
date Wed, 20 Feb 2013 11:39:06 -0500
parents
children
line wrap: on
line source

<tool id="predictnls" name="PredictNLS" version="0.0.4">
    <description>Find nuclear localization signals (NLSs) in protein sequences</description>
    <command interpreter="python">
      predictnls.py $fasta_file $tabular_file
    </command>
    <inputs>
        <param name="fasta_file" type="data" format="fasta" label="FASTA file of protein sequences"/> 
    </inputs>
    <outputs>
        <data name="tabular_file" format="tabular" label="predictNLS results" />
    </outputs>
    <tests>
        <test>
             <param name="fasta_file" value="four_human_proteins.fasta"/>
             <output name="tabular_file" file="four_human_proteins.predictnls.tabular"/>
        </test>
    </tests>
    <requirements>
        <requirement type="binary">predictnls</requirement>
    </requirements>
    <help>
    
**What it does**

This calls a Python re-implementation of the PredictNLS tool for prediction of
nuclear localization signals (NLSs), which works by looking for matches to
a known set of patterns (described using regular expressions).

The input is a FASTA file of protein sequences, and the output is tabular with
these columns (multiple rows per protein):

====== ==========================================================================
Column Description
------ --------------------------------------------------------------------------
     1 Sequence identifier
     2 Start of NLS
     3 NLS sequence
     4 NLS pattern (regular expression)
     5 Number of reference proteins with this NLS
     6 Percentage of reference proteins with this NLS which are nuclear localized
     7 Comma separated list of reference proteins
     8 Comma separated list of reference proteins' localizations
====== ==========================================================================

If a sequence has no predicted NLS, then there is no line in the output file
for it. This is a simplification of the text rich output from the command line
tool, to give a tabular file suitable for use within Galaxy.

Information about potential DNA binding (shown in the original predictnls
tool) is not given.

**Localizations**

The following abbreviations are used (derived from SWISS-PROT):

==== =======================
Abbr Localization         
---- -----------------------
cyt  Cytoplasm
pla  Chloroplast
ret  Eendoplasmic reticululm
ext  Extracellular
gol  Golgi
lys  Lysosomal
mit  Mitochondria
nuc  Nuclear
oxi  Peroxisom
vac  Vacuolar
rip  Periplasmic
==== =======================

**References**

Murat Cokol, Rajesh Nair, and Burkhard Rost.
Finding nuclear localization signals.
EMBO reports 1(5), 411–415, 2000
http://dx.doi.org/10.1093/embo-reports/kvd092

http://rostlab.org

    </help>
</tool>