Mercurial > repos > martasampaio > phagepromoter
view phagepromoter.xml @ 6:30b5e33eca40 draft
Uploaded
author | martasampaio |
---|---|
date | Sat, 20 Apr 2019 10:58:17 -0400 |
parents | 09a05b1e1379 |
children |
line wrap: on
line source
<tool id="get_proms" name="PhagePromoter" version="0.1.0"> <description> Get promoters of phage genomes </description> <requirements> <requirement type="package">biopython</requirement> <requirement type="package">scikit-learn</requirement> <requirement type="package">numpy</requirement> <requirement type="package">pandas</requirement> </requirements> <command detect_errors="exit_code" interpreter="python3"><![CDATA[ phagepromoter.py "$input_type.genome_format" "$genome" "$both" "$threshold" "$family" "$bacteria" "$lifecycle" "$adv.model" ]]> </command> <inputs> <conditional name="input_type"> <param type="select" name="genome_format" label='file format'> <option value="genbank" selected="yes">genbank</option> <option value="fasta">fasta</option> </param> <when value="genbank"> <param type="data" name="genome" format="genbank" label='genome'/> </when> <when value="fasta"> <param type="data" name="genome" format="fasta" label='genome'/> </when> </conditional> <param type="boolean" name="both" label='Search both strands' checked="false" truevalue="-both" falsevalue="" /> <param name="threshold" type="float" value="0.50" label="Threshold" help="Probabilty of being a promoter (float between 0 and 1)" /> <param type="select" name="family" label='Phage family'> <option value="Podoviridae" selected="yes">Podoviridae</option> <option value="Siphoviridae">Siphoviridae</option> <option value="Myoviridae">Myoviridae</option> </param> <param type="select" name="bacteria" label='Host bacteria Genus'> <option value="Escherichia coli" selected="yes">Escherichia coli</option> <option value="Salmonella">Salmonella</option> <option value="Pseudomonas">Pseudomonas</option> <option value="Yersinia">Yersinia</option> <option value="Morganella">Morganella</option> <option value="Cronobacter">Cronobacter</option> <option value="Staphylococcus">Staphylococcus</option> <option value="Streptococcus">Streptococcus</option> <option value="Lactococcus">Lactococcus</option> <option value="Streptomyces">Streptomyces</option> <option value="Klebsiella">Klebsiella</option> <option value="Bacillus">Bacillus</option> <option value="Pectobacterium">Pectobacterium</option> <option value="other">other</option> </param> <param type="select" name="lifecycle" label='Phage type'> <option value="virulent" selected="yes">virulent</option> <option value="temperate">temperate</option> </param> <section name = 'adv' title= 'Advanced Options' expanded = 'False'> <param type = "select" name="model" label="Model"> <option value="SVM2400" selected="yes">SVM2400</option> <option value="ANN1600">ANN1600</option> </param> </section> </inputs> <outputs> <data name="output1" format="html" from_work_dir="output.html" /> <data name="output2" format="fasta" from_work_dir="output.fasta" /> <data name="output3" format="genbank" from_work_dir="output.gb" /> </outputs> <tests> <test> <param name="genome_format" value="genbank"/> <param name="genome" value="NC_015264.gb"/> <param name="both" value="False"/> <param name="threshold" value="0.50"/> <param name="family" value="Podoviridae"/> <param name="bacteria" value="Pseudomonas"/> <param name="lifecycle" value="virulent"/> <param name="model" value="SVM2400"/> <output name="output1" file="output.html"/> <output name="output2" file="output.fasta"/> <output name="output3" file="output.gb"/> </test> </tests> <help><![CDATA[ =============== PhagePromoter =============== Get promoters of phage genomes PhagePromoter is a python script that predicts promoter sequences in phage genomes, using machine learning models. Two different datasets were used to developed two models: the ANN model was built using a dataset with 26 features and 2400 examples (800 positives and 1600 negatives) and the SVM model was created using a dataset with 19 features and 3200 examples (800 positives and 2400 negatives). Each example represents a sequence of 65 base pairs of a phage genome. The positive examples correspond to phage sequences already identified as promoters. **Inputs:** * genome format: fasta vs genbank (default); * genome file: acepts both GenBank and FASTA formats; * both strands: yes or no (default). Allows the search only in the direct strand or in both DNA strands; * threshold: represents the probability of the test sequence being a promoter (a float between 0 and 1, default=0.50). For example, if threshold=0.90, the model will only return predicted sequences with more than 90% probability of being a promoter. The larger the genome, the higher the threshold should be. * Family: The family of the testing phage - Podoviridae (default), Siphoviridae or Myoviridae; * Host: The host of the phage. The training dataset include the following hosts: Bacillus, Escherichia coli (default), Salmonella, Pseudomonas, Yersinia, Klebsiella, Pectobacterium, Morganella, Cronobacter, Staphylococcus, Streptococcus, Streptomyces, Lactococcus. If the testing phage has a different host, select the option 'other'. * Phage type: The type of the phage, according to its lifecycle: virulent or temperate; **Advanced options:** * Model: the user can choose which model to run: the SVM model (default) or the ANN model. The SVM model uses more negative data, so it will return less promoters but with a higher probability of being real promoters. However, it can fail to detect some of the real promoters. On the other hand, the ANN model will predict more promoters, so it can identify more real promoters, but it is expected to predict more false negatives. **Outputs:** This tool outputs two files: a FASTA file and a table in HTML, with the locations, sequence, score and type (recognized by host or phage RNAP) of the predicted promoters. In addition, the tool will output a GenBank file with the predicted promoters as features. **Requirements:** * Biopython * Sklearn * Numpy * Pandas ]]> </help> </tool>