Mercurial > repos > bgruening > protease_prediction

<tool id="eden_protease_prediction" name="Protease prediction" version="@VERSION@">
    <description>based on cleavage sites</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements"/>
    <expand macro="stdio"/>
    <version_command>echo "@VERSION@"</version_command>
    <command><![CDATA[
    python $__tool_directory__/protease.py
    #if $selected_tasks.selected_task == 'fit':
        fit
        -i $selected_tasks.infile_train
        --negative-ratio $selected_tasks.options.negative_ratio
        --shuffle-order $selected_tasks.options.shuffle_order
        -r  $selected_tasks.options.random_state
    #else:
        predict
        -m $selected_tasks.infile_model
        -i $selected_tasks.infile_data
    #end if
]]>
    </command>
    <inputs>
        <expand macro="loadConditional">
            <section name="options" title="Advanced Options" expanded="False">
                <param name="negative_ratio" type="integer" optional="true" value="2" label="Negative to positive instance ratio"
                    help="Relative size ratio for the randomly permuted negative instances w.r.t. the positive instances." />
                <param name="shuffle_order" type="integer" optional="true" value="2" label="Order of k-mer shuffling"
                    help="Order of the k-mer for the random shuffling procedure." />
                <param name="random_state" type="integer" value="1" label="Random seed" />
            </section>
        </expand>
    </inputs>
    <outputs>
        <data format="tabular" name="outfile_predict" from_work_dir="out/predictions.txt">
            <filter>selected_tasks['selected_task'] == 'predict'</filter>
        </data>
        <data format="eden_model" name="outfile_fit" from_work_dir="out/model">
            <filter>selected_tasks['selected_task'] == 'fit'</filter>
        </data>
    </outputs>
    <tests>
        <test>
            <param name="infile_train" value="CTSL_train.fasta" ftype="fasta"/>
            <param name="selected_task" value="fit"/>
            <param name="shuffle_order" value="3"/>
            <output name="outfile_fit" file="model" ftpye="eden_model" compare="sim_size" delta="100000"/>
        </test>
        <test>
            <param name="infile_model" value="model" ftype="eden_model"/>
            <param name="infile_data" value="CTSL_test.fasta" ftype="fasta"/>
            <param name="selected_task" value="predict"/>
            <output name="outfile_predict" file="predictions.txt" ftpye="tabular"/>
        </test>
    </tests>
    <help><![CDATA[
**What it does**

This tool can learn the cleavage specificity of a given class of protease. In a second step this can be used to predict proteases given a cleavage site.
The method assumes that the candidate cleavage point is between the two amino acids adjacent to the central position.
The method is based on an efficient string kernel implemented in the Explicit Decomposition with Neighbourhood (EDeN) library.
This approach uses the notion of k-mers with gaps to enumerate all possible substrings of increasing order which are used as features in an efficient linear binary classification estimator.

**Example Input**

::

  >CTSL1
  SSFVSNWD
  >CTSL1
  SSIQATTA
  >CTSL1
  SSLAGCQI
  >CTSL1
  SSLGGTVV


    ]]></help>
    <expand macro="eden_citation"/>
</tool>
author	bgruening
date	Sat, 12 Mar 2016 19:28:41 -0500
parents
children