annotate README.md @ 0:231e4c669675 draft

Initial commit - v0.10.3 git commit deeded0
author vimalkumarvelayudhan
date Tue, 27 Feb 2018 14:16:54 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
1 # VIGA
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
2 De novo Viral Genome Annotator
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
3
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
4 VIGA is a script written in Python 2.7 that annotates viral genomes automatically (using a de novo algorithm) and predict the function of their proteins using BLAST and HMMER.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
5
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
6 ## REQUIREMENTS:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
7
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
8 Before using this script, the following Python modules and programs should be installed:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
9
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
10 * Python modules:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
11 - BCBio (https://github.com/chapmanb/bcbio-nextgen)
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
12 - Biopython (Bio module; Cock et al. 2009)
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
13 - Numpy (https://github.com/numpy/numpy)
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
14 - Scipy (https://github.com/scipy/scipy)
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
15
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
16 * Programs:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
17 - GNU Parallel (Tange 2011): it is used to parallelize HMMER. The program is publicly available at https://www.gnu.org/software/parallel/ under the GPLv3 licence.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
18 - LASTZ (Harris 2007): it is used to predict the circularity of the contigs. The program is publicly available at https://github.com/lastz/lastz under the MIT licence.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
19 - Prodigal (Hyatt et al. 2010): it is used to predict the ORFs. When the contig is smaller than 20,000 bp, MetaProdigal (Hyatt et al. 2012) is automatically activated instead of normal Prodigal. This program is publicly available at https://github.com/hyattpd/prodigal/releases/ under the GPLv3 licence.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
20 - BLAST+ (Camacho et al. 2008): it is used to predict the function of the predicted proteins according to homology. This suite is publicly available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ under the GPLv2 licence. Databases are available at ftp://ftp.ncbi.nlm.nih.gov/blast/db/
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
21 - DIAMOND (Buchfink et al. 2015): it is used to predict the function of proteins according to homology when "--noblast" parameter is used. This program is publicly available at https://github.com/bbuchfink/diamond under the GPLv3 licence. Databases must be created from FASTA files according to their instructions before running.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
22 - HMMER (Finn et al. 2011): it is used to predict the function of the predicted proteins according to Hidden Markov Models. This suite is publicly available at http://hmmer.org/ under the GPLv3 licence. Databases must be in FASTA format and examples of potential databases are UniProtKB (ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz) or PFAM (http://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.fasta.gz).
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
23 - INFERNAL (Nawrocki and Eddy 2013): it is used to predict ribosomal RNA in the contigs when using the RFAM database (Nawrocki et al. 2015). This program is publicly available at http://eddylab.org/infernal/ under the BSD licence and RFAM database is available at ftp://ftp.ebi.ac.uk/pub/databases/Rfam/
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
24 - ARAGORN (Laslett and Canback 2004): it is used to predict tRNA sequences in the contig. This program is publicly available at http://mbio-serv2.mbioekol.lu.se/ARAGORN/ under the GPLv2 licence.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
25 - PILERCR (Edgar 2007): it is used to predict CRISPR repeats in your contig. This program is freely available at http://drive5.com/pilercr/ under a public licence.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
26 - Tandem Repeats Finder (TRF; Benson 1999): it is used to predict the tandem repeats in your contig. This program is freely available at https://tandem.bu.edu/trf/trf.html under a custom licence.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
27 - Inverted Repeats Finder (IRF; Warburton et al. 2004): it is used to predict the inverted repeats in your contig. This program is freely available at https://tandem.bu.edu/irf/irf.download.html under a custom licence.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
28
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
29 Although you can install the programs manually, we strongly recommend the use of the Docker image to create an environment for VIGA. The link to the Docker image is https://hub.docker.com/r/vimalkvn/viga/
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
30
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
31 However, you will need to download the databases for BLAST, HMMER, and INFERNAL:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
32 * BLAST DBs: https://ftp.ncbi.nlm.nih.gov/blast/db/
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
33 * BLAST FASTA (DIAMOND): https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
34 * RFAM (INFERNAL): http://ftp.ebi.ac.uk/pub/databases/Rfam/
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
35 * UniProtKB (HMMER): ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
36 * PFAM (HMMER): http://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.fasta.gz
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
37
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
38 Note that this bioinformatic pipeline only takes protein databases (i.e. "nr", "swissprot"...)!
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
39 Additionally, before running the "ultrafast" mode, you need to convert the FASTA file to the DIAMOND DB format using the following command:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
40
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
41 diamond makedb --in nr -d nr
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
42
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
43 When using this program, you must cite their use:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
44
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
45 VIGA v. 0.10.3 (https://github.com/EGTortuero/viga)
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
46
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
47 ## PARAMETERS:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
48
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
49 The program has the following two types of arguments:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
50
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
51 ### Mandatory parameters:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
52
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
53 <table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
54 <tr><td>--input FASTAFILE</td><td>Input file as a nucleotidic FASTA file. It can contains multiple sequences (e.g. metagenomic contigs)</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
55 <tr><td>--rfamdb RFAMDB</td><td>RFAM database that will be used for the ribosomal RNA prediction.</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
56 <tr><td>--modifiers TEXTFILE</td><td>Input file as a plain text file with the modifiers per every FASTA header according to SeqIn (https://www.ncbi.nlm.nih.gov/Sequin/modifiers.html). All modifiers must be written in a single line and are separated by a single space character. No space should be placed besides the = sign. For example: [organism=Serratia marcescens subsp. marcescens] [sub-species=marcescens] [strain=AH0650_Sm1] [moltype=DNA] [tech=wgs] [gcode=11] [country=Australia] [isolation-source=sputum]. This line will be copied and printed along with the record name as the definition line of every contig sequence.</tr></td>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
57 </table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
58
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
59 ### Advanced parameters:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
60
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
61 <table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
62 <tr><td>--readlength INT</td><td>Read length for the circularity prediction (default: 101 bp)</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
63 <tr><td>--windowsize INT</td><td>Window length used to determine the origin of replication in circular contigs according to the cumulative GC skew(default: 100 bp)</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
64 <tr><td>--slidingsize INT</td><td>Sliding window length used to determine the origin of replication in circular contigs according to the cumulative GC skew(default: 10 bp)</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
65 <tr><td>--out OUTPUTNAME</td><td>Name of the outputs files without extensions, as the program will add them automatically. By default, the program will use the input name as the output.</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
66 <tr><td>--locus STRING</td><td>Name of the contigs. If the input is a multiFASTA file, please put a general name as the program will add the number of the contig at the end of the name. By default, the name of the contigs will be "LOC".</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
67 <tr><td>--threads INT</td><td>Number of threads/CPUs. By default, the program will use 1 CPU.</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
68 <tr><td>--gff</td><td>Printing the output as a General Feature Format (GFF) version 3. It is a flat table file with contains 9 columns of data (see http://www.ensembl.org/info/website/upload/gff3.html for more information). By default, the program will not print the GFF3 file (--gff False).</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
69 <tr><td>--blastdb BLASTDB</td><td>BLAST database that will be used for the protein function prediction. The database MUST be for amino acids. This is only mandatory if the "ultrafast" mode is not active</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
70 <tr><td>--diamonddb DIAMONDDB</td><td>DIAMOND database that will be used for the protein function prediction. The database MUST be for amino acids. This is only mandatory when "ultrafast" mode is active</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
71 <tr><td>--blastevalue FLOAT</td><td>BLAST e-value threshold. By default, the threshold will be 1e-05.</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
72
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
73 <tr><td>--nohmmer</td><td>Running the program without using PHMMER to predict protein function. In this case, the program will be as fast as Prokka (Seemann 2014) but the annotations will not be accurate. By default, this program had this flag disabled.</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
74 <tr><td>--noblast</td><td>Running the program replacing BLAST by DIAMOND. In this case, the program will be fast but the annotations will not be accurate. By default, this program had this flag disabled.</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
75 <tr><td>--hmmdb HMMDB</td><td>PHMMER Database that will be used for the protein function prediction according to Hidden Markov Models. In this case, HMMDB must be in FASTA format (e.g. UniProt). This parameter is mandatory if the "--fast" option is disabled. "</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
76 <tr><td>--hmmerevalue FLOAT</td><td>PHMMER e-value threshold. By default, the threshold is 1e-03.</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
77 <tr><td>--typedata BCT|CON|VRL|PHG</td><td>GenBank Division: One of the following codes:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
78 <table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
79 <tr><td>BCT</td><td>Prokaryotic chromosome</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
80 <tr><td>VRL</td><td>Eukaryotic/Archaea virus</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
81 <tr><td>PHG</td><td>Phages</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
82 <tr><td>CON</td><td>Contig</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
83 </table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
84 By default, the program will consider every sequence as a contig (CON)</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
85 <tr><td>--gcode NUMBER</td><td>Number of GenBank translation table. At this moment, the available options are:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
86 <table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
87 <tr><td>1</td><td>Standard genetic code [Eukaryotic]</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
88 <tr><td>2</td><td>Vertebrate mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
89 <tr><td>3</td><td>Yeast mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
90 <tr><td>4</td><td>Mycoplasma/Spiroplasma and Protozoan/mold/coelenterate mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
91 <tr><td>5</td><td>Invertebrate mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
92 <tr><td>6</td><td>Ciliate/dasycladacean/Hexamita nuclear code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
93 <tr><td>9</td><td>Echinoderm/flatworm mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
94 <tr><td>10</td><td>Euplotid nuclear code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
95 <tr><td>11</td><td>Bacteria/Archaea/Phages/Plant plastid</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
96 <tr><td>12</td><td>Alternative yeast nuclear code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
97 <tr><td>13</td><td>Ascidian mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
98 <tr><td>14</td><td>Alternative flatworm mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
99 <tr><td>16</td><td>Chlorophycean mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
100 <tr><td>21</td><td>Trematode mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
101 <tr><td>22</td><td>Scedenesmus obliquus mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
102 <tr><td>23</td><td>Thraustochytrium mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
103 <tr><td>24</td><td>Pterobranquia mitochondrial code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
104 <tr><td>25</td><td>Gracilibacteria and Candidate division SR1</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
105 <tr><td>26</td><td>Pachysolen tannophilus nuclear code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
106 <tr><td>27</td><td>Karyorelict nuclear code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
107 <tr><td>28</td><td>Condylostoma nuclear code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
108 <tr><td>29</td><td>Mesodinium nuclear code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
109 <tr><td>30</td><td>Peritrich nuclear code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
110 <tr><td>31</td><td>Blastocrithidia nuclear code</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
111 </table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
112 By default, the program will use the translation table no. 11</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
113 <tr><td>--mincontigsize INT</td><td>Minimum contig length to be considered in the final files. By default, the program only consider from 200 bp.</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
114 <tr><td>--idthr FLOAT</td><td>Identity threshold to consider that a protein belong to a specific hit. By default, the threshold is 50.0 %</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
115 <tr><td>--coverthr FLOAT</td><td>Coverage threshold to consider that a protein belong to a specific hit. By default, the threshold is 50.0 %</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
116 <tr><td>--diffid FLOAT (>0.01)</td><td>Max allowed difference between the ID percentages of BLAST and HMMER. By default, the allowed difference is 5.00 % and we do not recommended to change such value.</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
117 <tr><td>--minrepeat INT</td><td>Minimum repeat length for CRISPR detection (Default: 16)</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
118 <tr><td>--maxrepeat INT</td><td>Maximum repeat length for CRISPR detection (Default: 64)</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
119 <tr><td>--minspacer INT</td><td>Minimum spacer length for CRISPR detection (Default: 8)</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
120 <tr><td>--maxspacer INT</td><td>Maximum spacer length for CRISPR detection (Default: 64)</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
121 <tr><td>--blastexh</td><td>Use of exhaustive BLAST to predict the proteins by homology according to Fozo et al. (2010). In this case, the search will be done using a word size of 2, a gap open penalty of 8, a gap extension penalty of 2, the PAM70 matrix instead of the BLOSUM62 and no compositional based statistics. This method is more accurate to predict the functions of the proteins but it is slower than BLAST default parameters. By default, exhaustive BLAST is disabled.</td></tr>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
122 </table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
123
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
124 ## Examples
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
125 An example of execution (using BLAST and HMMER) is:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
126
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
127 python VIGA.py --input eukarya.fasta --blastdb databases/blast/nr/nr --hmmdb databases/UniProt/uniprot_trembl.fasta --rfamdb databases/rfam/Rfam.cm --gcode 1 --out eukarya_BENCHMARK --modifiers ../modifiers.txt --threads 10
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
128
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
129 Another example (but this time using BLAST but not HMMER - "fast mode") is:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
130
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
131 python VIGA.py --input bacteria.fasta --blastdb databases/blast/nr/nr --nohmmer --rfamdb databases/rfam/Rfam.cm --out bacteria_BENCHMARK --modifiers ../modifiers.txt --threads 10
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
132
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
133 Finally, an example using DIAMOND and not HMMER ("ultrafast mode") is:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
134
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
135 python VIGA.py --input archaea.fasta --noblast --diamonddb databases/diamond/nr --nohmmer --rfamdb databases/rfam/Rfam.cm --out archaea_BENCHMARK --modifiers ../modifiers.txt --threads 10
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
136
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
137 ## Galaxy wrapper
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
138 VIGA can be integrated into [Galaxy](https://galaxyproject.org) using the wrapper included in this repository.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
139
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
140 ### Requirements
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
141
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
142 [Docker](https://www.docker.com) should first be installed and working on the
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
143 server where this Galaxy instance is setup. The user running Galaxy should be
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
144 part of the **docker** user group.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
145
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
146 #### Manual installation of the wrapper from Github
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
147
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
148 1. Download or clone this repository (as a submodule) in the **tools**
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
149 directory of the Galaxy installation.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
150
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
151 2. Update **config/tool_conf.xml** to add the VIGA wrapper in a relevant section of the tool panel. For example, "Annotation".
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
152
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
153 <section id="annotation" name="Annotation">
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
154 <tool file="viga/wrapper.xml" />
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
155 </section>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
156
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
157 3. Copy (or update the file if it is already present) the included
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
158 **tool_data_table_conf.xml.sample** file to
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
159 **config/tool_data_table_conf.xml**.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
160
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
161 <!-- VIGA databases -->
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
162 <tables>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
163 <table name="viga_blastdb" comment_char="#">
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
164 <columns>value, dbkey, name, path</columns>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
165 <file path="tool-data/viga_blastdb.loc" />
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
166 </table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
167 <table name="viga_diamonddb" comment_char="#">
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
168 <columns>value, dbkey, name, path</columns>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
169 <file path="tool-data/viga_diamonddb.loc" />
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
170 </table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
171 <table name="viga_rfamdb" comment_char="#">
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
172 <columns>value, dbkey, name, path</columns>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
173 <file path="tool-data/viga_rfamdb.loc" />
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
174 </table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
175 <table name="viga_hmmdb" comment_char="#">
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
176 <columns>value, dbkey, name, path</columns>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
177 <file path="tool-data/viga_hmmdb.loc" />
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
178 </table>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
179 </tables>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
180
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
181 4. Copy the **.loc.sample** files from **viga/tool-data** to **galaxy/tool-data**
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
182 and rename them as **.loc**. For example:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
183
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
184 viga_blastdb.loc.sample -> viga_blastdb.loc
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
185
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
186 #### Update database paths in .loc files
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
187
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
188 Edit the following files in the **tool-data** directory and add paths to
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
189 corresponding databases
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
190
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
191 * viga_blastdb.loc
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
192 * viga_diamonddb.loc
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
193 * viga_rfamdb.loc
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
194 * viga_hmmdb.loc
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
195
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
196 #### Create or update the Galaxy job configuration file
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
197
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
198 If the file **config/job_conf.xml** does not exist, create it by copying the
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
199 template **config/job_conf.xml.sample_basic** in the Galaxy directory. Then
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
200 add a Docker destination for viga. Change ``/data/databases`` under
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
201 ``docker_volumes`` to the location where your databases are stored. Here is
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
202 an example:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
203
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
204 <?xml version="1.0"?>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
205 <!-- A sample job config that explicitly configures job running the way it is configured by default (if there is no explicit config). -->
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
206 <job_conf>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
207 <plugins>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
208 <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
209 </plugins>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
210 <handlers>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
211 <handler id="main"/>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
212 </handlers>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
213 <destinations default="local">
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
214 <destination id="local" runner="local"/>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
215 <destination id="docker" runner="local">
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
216 <param id="docker_enabled">true</param>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
217 <param id="docker_sudo">false</param>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
218 <param id="docker_auto_rm">true</param>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
219 <param id="docker_volumes">$defaults,/data/databases:ro</param>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
220 </destination>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
221 </destinations>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
222 <tools>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
223 <tool id="viga" destination="docker"/>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
224 </tools>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
225 </job_conf>
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
226
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
227
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
228 **Restart Galaxy**. The tool will now be ready to use.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
229
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
230
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
231 ## HISTORY OF THE SOURCE CODE:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
232
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
233 * v 0.10.3 - New output: all protein sequences per contig.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
234 * v 0.10.1 - Fixed error when the start coordinate of a gene is equal to one. In these cases, genes were annotated as if they started in the position zero (which it has no biological logic). Now, the program should be able to deal with these genes, annotate them from the position 1. Moreover, added new terms to reduce all non-informative protein descriptions before running the decision tree.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
235 * v 0.10.0 - Added the prediction of the origin and terminus of replication for circular contigs based on the cumulative GC skew (based on the iRep software - Brown et al (2016)). After detecting the origin coordinate, the chromosome is realigned from the origin. As a consequence of that, two new parameters ("--windowsize" and "--slidingsize") were added to determine the window size and the sliding window size respectively. Moreover, fixed error with the start position of the genes in the GenBank files, which were not related to the amino acid sequences and made that the sequence length was not multiple of three. Finally, added "/locus_tag" in the putative genes in the GenBank files
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
236 * v 0.9.1 - Fixed a bug in creation of logfile
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
237 * v 0.9.0 - Improved the BLAST/DIAMOND and HMMER parsers to reduce all non-informative protein descriptions (e.g. "hypothetical protein", "ORF") before running the decision tree algorithm. Additionally, a new output file (logfile.txt) is generated to harbour the information about the old contig names and the new ones (generated by the program).
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
238 * v 0.8.2 - Fixed issue with DIAMOND when there is no protein sequence as input.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
239 * v 0.8.1 - The program is able to deal with tmRNA sequences in a proper way. There were an error due to the "(Permuted)" flag in ARAGORN files in some cases. Additionally, the name of the "--fast" and "--ultrafast" parameters were changed to "--nohmmer" and "--noblast" as their descriptions are more accurate.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
240 * v 0.8.0 - Added the "--ultrafast" parameter. In this case, DIAMOND (Buchfink et al. 2015) will be launch to predict protein function according to homology instead of BLAST. It is faster than the "fast" mode but the sensitivity of the annotations will not be the highest.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
241 * v 0.7.1 - Fixed error on the "--fast" parameter. All proteins that had no hits in BLAST analyses were not parsed properly. By now, these are identified as "Hypothetical proteins" in all files.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
242 * v 0.7.0 - Added the "--fast" parameter. In this case, the program will launch BLAST (but not PHMMER) to annotate protein function. In this case, the program will be as fast as Prokka (Seemann 2014) but the annotations will not be accurate. As a consequence of this new parameter, the "--hmmdb" parameter is only mandatory when this flag is NOT used (as by default).
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
243 * v 0.6.2 - Removed the "--noparallel" parameter. After doing time benchmarks to test the speed of BLAST and HMMER when they are run using the multithreading option and as a parallel program, we found that BLAST tends to be faster using multithreading option while HMMER had the opposite behavior. For that, we decided to consider only the parallelization of HMMER and to run BLAST using multiple threads.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
244 * v 0.6.1 - Fixed issue with parallel HMMER (the program tend to take all available CPUs independently of the parsed arguments) and with the BLAST/HMMER decision trees (typos).
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
245 * v 0.6.0 - Replaced HHSUITE by HMMER 3.1 to predict protein function according to Hidden Markov Models. In a recent benchmark (as well as internal ones), we found that HHPred tends to be the slowest program to predict protein function (compared with PHMMER and BLASTP). Additionally, HMMER had a high accuracy when proteins are annotated (Saripella et al. 2016). Moreover, it has the advantage that the databases must be in FASTA format (such UniProt and, even, PFAM), which it is a standard format. For all these reasons, we replaced HHSUITE by HMMER 3.1. Additionally, fixed small issues related to the GenBank file (omission of the contig topology as well as the name of the locus).
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
246 * v 0.5.0 - Implemented PILER-CR to predict CRISPR repeats regions. Additionally, fixed errors in the rRNA prediction and inverted and tandem repeats.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
247 * v 0.4.0 - Replaced RNAmmer v 1.2. by INFERNAL 1.1 + RFAM to predict rRNA in the contigs. In this case, you must specify where you have downloaded the RFAM database using the "--rfamdb" option.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
248 * v 0.3.0 - Implemented RNAmmer v 1.2 to predict rRNA in the contigs. If such program is able to predict ribosomal genes, a warning is printed (as viral sequences do not have ribosomal genes).
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
249 * v 0.2.0 - Added parallelization of BLAST and HHSUITE. To do that, GNU Parallel (Tange 2011) is required. To disable this option, run the program with the "--noparallel" option.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
250 * v 0.1.0 - Original version of the program.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
251
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
252 ## REFERENCES:
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
253
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
254 - Benson G (2008) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27: 573–80.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
255 - Brown CT, Olm MR, Thomas BC, Banfield JF (2016) Measurement of bacterial replication rates in microbial communities. Nature Biotechnology 34: 1256-63.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
256 - Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nature Methods 12: 59-60.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
257 - Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2008) BLAST+: architecture and applications. BMC Bioinformatics 10: 421.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
258 - Edgar RC (2007) PILER-CR: fast and accurate identification of CRISPR repeats. BMC Bioinformatics 8:18.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
259 - Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Research 39: W29-37.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
260 - Fozo EM, Makarova KS, Shabalina SA, Yutin N, Koonin EV, Storz G (2010) Abundance of type I toxin-antitoxin systems in bacteria: searches for new candidates and discovery of novel families. Nucleic Acids Research 38: 3743-59.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
261 - Harris RS (2007) Improved pairwise alignment of genomic DNA. Ph.D. Thesis, The Pennsylvania State University.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
262 - Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
263 - Hyatt D, Locascio PF, Hauser LJ, Uberbacher EC (2012) Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28: 2223-30.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
264 - Laslett D, Canback B (2004) ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Research 32, 11–16.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
265 - Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29: 2933-35.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
266 - Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J, Finn RD (2013) Rfam 12.0: updates to the RNA families database. Nucleic Acids Research 43: D130-7.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
267 - Saripella GV, Sonnhammer EL, Forslund K (2016) Benchmarking the next generation of homology inference tools. Bioinformatics 32: 2636-41.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
268 - Seemann T (2014) Prokka: rapid prokaryote genome annotation. Bioinformatics 30: 2068-9.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
269 - Tange O (2011) GNU Parallel - The Command-Line Power Tool. ;login: The USENIX Magazine 36:42-7.
231e4c669675 Initial commit - v0.10.3 git commit deeded0
vimalkumarvelayudhan
parents:
diff changeset
270 - Warburton PE, Giordano J, Cheung F, Gelfand Y, Benson G (2004) Inverted repeat structure of the human genome: The X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. Genome Research 14: 1861-9.