annotate phagepromoter.xml @ 4:09a05b1e1379 draft

Uploaded
author martasampaio
date Sat, 20 Apr 2019 10:57:46 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
1 <tool id="get_proms" name="PhagePromoter" version="0.1.0">
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
2 <description>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
3 Get promoters of phage genomes
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
4 </description>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
5 <requirements>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
6 <requirement type="package">biopython</requirement>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
7 <requirement type="package">scikit-learn</requirement>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
8 <requirement type="package">numpy</requirement>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
9 <requirement type="package">pandas</requirement>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
10 </requirements>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
11 <command detect_errors="exit_code" interpreter="python3"><![CDATA[
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
12 phagepromoter.py "$input_type.genome_format" "$genome" "$both" "$threshold" "$family" "$bacteria" "$lifecycle"
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
13 "$adv.model" ]]>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
14 </command>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
15 <inputs>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
16 <conditional name="input_type">
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
17 <param type="select" name="genome_format" label='file format'>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
18 <option value="genbank" selected="yes">genbank</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
19 <option value="fasta">fasta</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
20 </param>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
21 <when value="genbank">
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
22 <param type="data" name="genome" format="genbank" label='genome'/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
23 </when>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
24 <when value="fasta">
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
25 <param type="data" name="genome" format="fasta" label='genome'/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
26 </when>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
27 </conditional>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
28 <param type="boolean" name="both" label='Search both strands' checked="false" truevalue="-both" falsevalue="" />
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
29 <param name="threshold" type="float" value="0.50" label="Threshold" help="Probabilty of being a promoter (float between 0 and 1)" />
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
30 <param type="select" name="family" label='Phage family'>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
31 <option value="Podoviridae" selected="yes">Podoviridae</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
32 <option value="Siphoviridae">Siphoviridae</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
33 <option value="Myoviridae">Myoviridae</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
34 </param>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
35 <param type="select" name="bacteria" label='Host bacteria Genus'>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
36 <option value="Escherichia coli" selected="yes">Escherichia coli</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
37 <option value="Salmonella">Salmonella</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
38 <option value="Pseudomonas">Pseudomonas</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
39 <option value="Yersinia">Yersinia</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
40 <option value="Morganella">Morganella</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
41 <option value="Cronobacter">Cronobacter</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
42 <option value="Staphylococcus">Staphylococcus</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
43 <option value="Streptococcus">Streptococcus</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
44 <option value="Lactococcus">Lactococcus</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
45 <option value="Streptomyces">Streptomyces</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
46 <option value="Klebsiella">Klebsiella</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
47 <option value="Bacillus">Bacillus</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
48 <option value="Pectobacterium">Pectobacterium</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
49 <option value="other">other</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
50 </param>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
51 <param type="select" name="lifecycle" label='Phage type'>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
52 <option value="virulent" selected="yes">virulent</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
53 <option value="temperate">temperate</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
54 </param>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
55 <section name = 'adv' title= 'Advanced Options' expanded = 'False'>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
56 <param type = "select" name="model" label="Model">
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
57 <option value="SVM2400" selected="yes">SVM2400</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
58 <option value="ANN1600">ANN1600</option>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
59 </param>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
60 </section>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
61 </inputs>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
62 <outputs>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
63 <data name="output1" format="html" from_work_dir="output.html" />
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
64 <data name="output2" format="fasta" from_work_dir="output.fasta" />
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
65 <data name="output3" format="genbank" from_work_dir="output.gb" />
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
66 </outputs>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
67 <tests>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
68 <test>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
69 <param name="genome_format" value="genbank"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
70 <param name="genome" value="NC_015264.gb"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
71 <param name="both" value="False"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
72 <param name="threshold" value="0.50"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
73 <param name="family" value="Podoviridae"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
74 <param name="bacteria" value="Pseudomonas"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
75 <param name="lifecycle" value="virulent"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
76 <param name="model" value="SVM2400"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
77 <output name="output1" file="output.html"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
78 <output name="output2" file="output.fasta"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
79 <output name="output3" file="output.gb"/>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
80 </test>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
81 </tests>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
82 <help><![CDATA[
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
83
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
84 ===============
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
85 PhagePromoter
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
86 ===============
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
87
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
88 Get promoters of phage genomes
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
89
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
90 PhagePromoter is a python script that predicts promoter sequences in phage genomes, using machine learning models. Two different datasets were used to developed two models: the ANN model was built using a dataset with 26 features and 2400 examples (800 positives and 1600 negatives) and the SVM model was created using a dataset with 19 features and 3200 examples (800 positives and 2400 negatives).
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
91 Each example represents a sequence of 65 base pairs of a phage genome. The positive examples correspond to phage sequences already identified as promoters.
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
92
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
93 **Inputs:**
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
94
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
95 * genome format: fasta vs genbank (default);
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
96 * genome file: acepts both GenBank and FASTA formats;
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
97 * both strands: yes or no (default). Allows the search only in the direct strand or in both DNA strands;
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
98 * threshold: represents the probability of the test sequence being a promoter (a float between 0 and 1, default=0.50). For example, if threshold=0.90, the model will only return predicted sequences with more than 90% probability of being a promoter. The larger the genome, the higher the threshold should be.
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
99 * Family: The family of the testing phage - Podoviridae (default), Siphoviridae or Myoviridae;
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
100 * Host: The host of the phage. The training dataset include the following hosts: Bacillus, Escherichia coli (default), Salmonella, Pseudomonas, Yersinia, Klebsiella, Pectobacterium, Morganella, Cronobacter, Staphylococcus, Streptococcus, Streptomyces, Lactococcus. If the testing phage has a different host, select the option 'other'.
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
101 * Phage type: The type of the phage, according to its lifecycle: virulent or temperate;
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
102
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
103 **Advanced options:**
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
104
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
105 * Model: the user can choose which model to run: the SVM model (default) or the ANN model. The SVM model uses more negative data, so it will return less promoters but with a higher probability of being real promoters. However, it can fail to detect some of the real promoters. On the other hand, the ANN model will predict more promoters, so it can identify more real promoters, but it is expected to predict more false negatives.
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
106
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
107 **Outputs:**
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
108
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
109 This tool outputs two files: a FASTA file and a table in HTML, with the locations, sequence, score and type (recognized by host or phage RNAP) of the predicted promoters.
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
110 In addition, the tool will output a GenBank file with the predicted promoters as features.
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
111
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
112 **Requirements:**
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
113
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
114 * Biopython
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
115 * Sklearn
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
116 * Numpy
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
117 * Pandas
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
118
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
119 ]]> </help>
09a05b1e1379 Uploaded
martasampaio
parents:
diff changeset
120 </tool>