view README.rst @ 27:636e332434ff draft

Uploaded
author martasampaio
date Mon, 28 Oct 2019 11:57:47 -0400
parents b575af79e250
children
line wrap: on
line source

===============
PhagePromoter
===============

Get promoter of phage genomes

PhagePromoter is a python script that predicts promoter sequences in phage genomes, using a machine learning SVM model. This model was built from a train dataset with 19 features and 3200 examples (800 positives and 2400 negatives), each representing a 65 bp sequence of a phage genome. The positive cases represent the phage sequences that are already identified as promoters. 

**Inputs:**

* genome format: fasta vs genbank; 
* genome file: acepts both genbank and fasta formats;
* both strands (yes or no): allows the search in both DNA strands;
* threshold: represents the probability of the test sequence be a promoter (float between 0 and 1)"
* family: The family of the testing phage - Podoviridae, Siphoviridae or Myoviridae;
* Bacteria: The host of the phage. The train dataset include the following hosts: Bacillus, EColi, Salmonella, Pseudomonas, Yersinia, Klebsiella, Pectobacterium, Morganella, Cronobacter, Staphylococcus, Streptococcus, Streptomyces, Lactococcus. If the testing phage has a different host, select the option 'other', and it is recommended the use of a higher threshold value for more accurate results.
* phage type: The type of the phage, according to its lifecycle: virulent or temperate;

**Outputs:**
This tool outputs two files: a FASTA file and a table in HTML, with the locations, sequence, score and type (recognized by host or phage RNAP) of the predicted promoters.

**Requirements:**

* Biopython
* Sklearn 
* Numpy
* Pandas