Mercurial > repos > mmonot > phageterm
view README.txt @ 24:c8f88ae512f3 draft default tip
Uploaded
author | mmonot |
---|---|
date | Tue, 17 Sep 2024 13:35:16 +0000 |
parents | |
children |
line wrap: on
line source
PROGRAM ======= PhageTerm.py - run as command line in a shell UPDATES ======= Bug fix: When the repeat region (of DTR phages) wraps around the reference contig ends, the first and last bases of the repeat region were missing in the reported sequence. Bug fixed thanks to Matthew Lueder (Naval Medical Research Center-Frederick). Bug fix: cohesive sequence of COS 3' phages are now correctly determined. Thanks to Dr Wei Shen. VERSION ======= Version 1.0.12 INTRODUCTION ============ PhageTerm software is a tool to determine phage termini and packaging mode from high throughput sequences that rely on the random fragmentation of DNA (e.g. Illumina TruSeq). Phage sequencing reads from a fastq file are aligned to the phage reference genome in order to calculate two types of coverage values (whole genome coverage and the starting position coverage). The starting position coverage is used to perform a detailed termini analysis. If the user provides the host sequence, reads that does not match the phage genome are tested on the host using the same mapping function. The PhageTerm program and information is available at https://sourceforge.net/projects/phageterm/ A Galaxy wrapper version is also available at https://galaxy.pasteur.fr PREREQUISITES ============= Unix/Linux - Python 2.7 - matplotlib 2.0.2 - numpy 1.11 - pandas 0.19.1 - scikit-learn 0.18.1 - scipy 0.19.0 - statsmodels 0.8.0 - reportlab 3.4.0 COMMAND LINE ============ ./PhageTerm.py -f reads.fastq -r phage_sequence.fasta [-n phage_name -p reads_paired -s seed_lenght -d surrounding -t installation_test -c nbr_core -g host.fasta (warning increase process time)] Help: ./PhageTerm.py -h ./PhageTerm.py --help Options: Raw reads file in fastq format: -f INPUT_FILE, --fastq=INPUT_FILE Fastq reads (NGS sequences from random fragmentation DNA only, e.g. Illumina TruSeq) Raw reads file in fastq format: -p INPUT_FILE, --paired=INPUT_FILE Paired fastq reads (NGS sequences from random fragmentation DNA only, e.g. Illumina TruSeq) Phage genome in fasta format: -r INPUT_FILE, --ref=INPUT_FILE Reference phage genome as unique contig in fasta format Name of the phage being analyzed by the user: -n PHAGE_NAME, --phagename=PHAGE_NAME Manually enter the name of the phage being analyzed. Used as prefix for output files. Lenght of the seed used for reads in the mapping process: -s SEED_LENGHT, --seed=SEED_LENGHT Manually enter the lenght of the seed used for reads in the mapping process (Default: 20). Lenght of the seed used for reads in the mapping process: -d SUROUNDING_LENGHT, --surrounding=SUROUNDING_LENGHT Manually enter the lenght of the surrounding used to merge close peaks in the analysis process (Default: 20). Host genome in fasta format: -g INPUT_FILE, --host=INPUT_FILE Reference host genome as unique contig in fasta format Warning: increase drastically process time Core processor number to use: -c CORE_NBR, --core=CORE_NBR Number of core processor to use (Default: 1). Define phage mean coverage: -m MEAN_NBR, --mean=MEAN_NBR Phage mean coverage to use (Default: 250). Software run test: -t TEST_VALUE, --test=TEST_VALUE TEST_VALUE=C5 : Test run for a 5' cohesive end (e.g. Lambda) TEST_VALUE=C3 : Test run for a 3' cohesive end (e.g. HK97) TEST_VALUE=DS : Test run for a short Direct Terminal Repeats end (e.g. T7) TEST_VALUE=DL : Test run for a long Direct Terminal Repeats end (e.g. T5) TEST_VALUE=H : Test run for a Headful packaging (e.g. P1) TEST_VALUE=M : Test run for a Mu-like packaging (e.g. Mu) OUTPUT FILES ========== (i) Report (.pdf) (ii) Statistical table (.csv) (iii) Sequence files (.fasta) CONTACT ======= Julian Garneau <julian.garneau@usherbrooke.ca> Marc Monot <marc.monot@pasteur.fr> David Bikard <david.bikard@pasteur.fr>