view tools/protein_analysis/promoter2.xml @ 15:6abd809cefdd draft

Uploaded v0.2.4, added unit tests for Promoter 2
author peterjc
date Thu, 25 Apr 2013 12:25:52 -0400
parents e52220a9ddad
children 7de64c8b258d
line wrap: on
line source

<tool id="promoter2" name="Promoter 2.0" version="0.0.4">
    <description>Find eukaryotic PolII promoters in DNA sequences</description>
    <!-- If job splitting is enabled, break up the query file into parts -->
    <!-- Using 2000 per chunk so 4 threads each doing 500 is ideal -->
    <parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism>
    <command interpreter="python">
        promoter2.py "\$NSLOTS" $fasta_file $tabular_file
        ##Set the number of threads in the runner entry in universe_wsgi.ini
        ##which (on SGE at least) will set the $NSLOTS environment variable.
        ##If the environment variable isn't set, get "", and defaults to one.
    </command>
    <stdio>
        <!-- Anything other than zero is an error -->
        <exit_code range="1:" />
        <exit_code range=":-1" />
    </stdio>
    <inputs>
        <param name="fasta_file" type="data" format="fasta" label="FASTA file of DNA sequences"/> 
    </inputs>
    <outputs>
        <data name="tabular_file" format="tabular" label="Promoter2 on ${fasta_file.name}" />
    </outputs>
    <requirements>
        <requirement type="binary">promoter</requirement>
    </requirements>
    <tests>
        <test>
            <param name="fasta_file" value="Adenovirus.fasta" ftype="fasta"/>
            <output name="tabular_file" file="Adenovirus.promoter2.tabular" ftype="tabular"/>
        </test>
        <test>
            <param name="fasta_file" value="empty.fasta" ftype="fasta"/>
            <output name="tabular_file" file="empty_promoter2.tabular" ftype="tabular"/>
        </test>
    </tests>
    <help>
    
**What it does**

This calls the Promoter 2.0 tool for prediction of eukaryotic PolII promoter sequences using a Neural Network (NN) model.

The input is a FASTA file of nucleotide sequences (e.g. upstream regions of your genes), and the output is tabular with five columns (one row per promoter):

 1. Sequence identifier (first word of FASTA header)
 2. Promoter position, e.g. 600
 3. Promoter score, e.g. 1.063
 4. Promoter likelihood, e.g. Highly likely prediction

The scores are classified very simply as follows:

========= ========================
Score     Description
--------- ------------------------
below 0.5 ignored
0.5 - 0.8 Marginal prediction
0.8 - 1.0 Medium likely prediction
above 1.0 Highly likely prediction 
========= ========================

Internally the input FASTA file is divided into parts (to allow multiple processors to be used), and the raw output is reformatted into this tabular layout suitable for downstream analysis within Galaxy.

**References**

Knudsen.
Promoter2.0: for the recognition of PolII promoter sequences.
Bioinformatics, 15:356-61, 1999.
http://dx.doi.org/10.1093/bioinformatics/15.5.356

http://www.cbs.dtu.dk/services/Promoter/output.php

    </help>
</tool>