comparison Roary/README.md @ 0:c47a5f61bc9f draft

Uploaded
author dereeper
date Fri, 14 May 2021 20:27:06 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:c47a5f61bc9f
1 # Roary - The pan genome pipeline
2 Takes annotated assemblies in GFF3 format and calculates the pan genome.
3
4 PLEASE NOTE: we currently do not have the resources to provide support for Roary, so please do not expect a reply if you flag any issue.
5
6 [![Unmaintained](http://unmaintained.tech/badge.svg)](http://unmaintained.tech/)
7 [![Build Status](https://travis-ci.org/sanger-pathogens/Roary.svg?branch=master)](https://travis-ci.org/sanger-pathogens/Roary)
8 [![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-brightgreen.svg)](https://github.com/sanger-pathogens/roary/blob/master/GPL-LICENSE)
9 [![status](https://img.shields.io/badge/Bioinformatics-10.1093-brightgreen.svg)](https://academic.oup.com/bioinformatics/article/31/22/3691/240757)
10 [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/recipes/roary/README.html)
11 [![Container ready](https://img.shields.io/badge/container-ready-brightgreen.svg)](https://quay.io/repository/biocontainers/roary)
12 [![Docker Build Status](https://img.shields.io/docker/build/sangerpathogens/roary.svg)](https://hub.docker.com/r/sangerpathogens/roary)
13 [![Docker Pulls](https://img.shields.io/docker/pulls/sangerpathogens/roary.svg)](https://hub.docker.com/r/sangerpathogens/roary)
14 [![codecov](https://codecov.io/gh/sanger-pathogens/roary/branch/master/graph/badge.svg)](https://codecov.io/gh/sanger-pathogens/roary)
15
16 ## Contents
17 * [Introduction](#introduction)
18 * [Installation](#installation)
19 * [Required dependencies](#required-dependencies)
20 * [Optional dependencies](#optional-dependencies)
21 * [Ubuntu/Debian](#ubuntudebian)
22 * [Debian Testing](#debian-testing)
23 * [Ubuntu 14\.04/16\.04](#ubuntu-14041604)
24 * [Ubuntu 12\.04](#ubuntu-1204)
25 * [Bioconda \- OSX/Linux](#bioconda---osxlinux)
26 * [Galaxy](#galaxy)
27 * [GNU Guix](#gnu-guix)
28 * [Virtual Machine \- OSX/Linux/Windows](#virtual-machine---osxlinuxwindows)
29 * [Docker \- OSX/Linux/Windows/Cloud](#docker---osxlinuxwindowscloud)
30 * [Installing from source (advanced Linux users only)](#installing-from-source-advanced-linux-users-only)
31 * [Ancient systems and versions of perl](#ancient-systems-and-versions-of-perl)
32 * [Running the tests](#running-the-tests)
33 * [Versions of software we test against](#versions-of-software-we-test-against)
34 * [Usage](#usage)
35 * [License](#license)
36 * [Feedback/Issues](#feedbackissues)
37 * [Citation](#citation)
38 * [Further Information](#further-information)
39
40 ## Introduction
41 Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka) and calculates the pan genome. Using a standard desktop PC, it can analyse datasets with thousands of samples, something which is computationally infeasible with existing methods, without compromising the quality of the results. 128 samples can be analysed in under 1 hour using 1 GB of RAM and a single processor. To perform this analysis using existing methods would take weeks and hundreds of GB of RAM.
42
43 ## Installation
44 Roary has the following dependencies:
45
46 ### Required dependencies
47 * [bedtools](https://bedtools.readthedocs.io/en/latest/)
48 * [cd-hit](http://weizhongli-lab.org/cd-hit/)
49 * [ncbi-blast+](https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download)
50 * [mcl](https://micans.org/mcl/)
51 * [parallel](https://www.gnu.org/software/parallel/)
52 * [prank](http://wasabiapp.org/software/prank/)
53 * [mafft](https://mafft.cbrc.jp/alignment/software/)
54 * [fasttree](http://www.microbesonline.org/fasttree/)
55
56 ### Optional dependencies
57 * [kraken](http://ccb.jhu.edu/software/kraken/MANUAL.html)
58
59 There are a number of ways to install Roary and details are provided below. If you encounter an issue when installing Roary please contact your local system administrator.
60
61 ### Ubuntu/Debian
62 #### Debian Testing
63 ```
64 sudo apt-get install roary
65 ```
66
67 #### Ubuntu 14.04/16.04
68 All the dependancies can be installed using apt and cpanm. Root permissions are required. Ubuntu 16.04 contains a package for Roary but it is frozen at v3.6.0.
69
70 ```
71 sudo apt-get install bedtools cd-hit ncbi-blast+ mcl parallel cpanminus prank mafft fasttree
72 sudo cpanm -f Bio::Roary
73 ```
74
75 #### Ubuntu 12.04
76 Some of the software versions in apt are quite old so follow the instructions for Bioconda below.
77
78 ### Bioconda - OSX/Linux
79 Install conda. Then install bioconda and roary:
80
81 ```
82 conda config --add channels r
83 conda config --add channels defaults
84 conda config --add channels conda-forge
85 conda config --add channels bioconda
86 conda install roary
87 ```
88
89 ### Galaxy
90 Roary is available from the Galaxy toolshed (as is Prokka).
91
92 ### GNU Guix
93 Roary is included in [Guix](https://www.gnu.org/software/guix) and can be installed in the usual way:
94 ```
95 guix package --install roary
96 ```
97
98 ### Virtual Machine - OSX/Linux/Windows
99 Roary wont run natively on Windows but we have created virtual machine which has all of the software setup, including Prokka, along with the test datasets from the paper. It is based on [Bio-Linux 8](http://environmentalomics.org/bio-linux/). You need to first install [VirtualBox](https://www.virtualbox.org/), then load the virtual machine, using the 'File -> Import Appliance' menu option. The root password is 'manager'.
100
101 ftp://ftp.sanger.ac.uk/pub/pathogens/pathogens-vm/pathogens-vm.latest.ova
102
103 More importantly though, if you're trying to do bioinformatics on Windows, you're not going to get very far and you should seriously consider upgrading to Linux.
104
105 ### Docker - OSX/Linux/Windows/Cloud
106 We have a docker container which gets automatically built from the latest version of Roary in Debian Med. To install it:
107
108 ```
109 docker pull sangerpathogens/roary
110 ```
111
112 To use it you would use a command such as this (substituting in your directories), where your GFF files are assumed to be stored in /home/ubuntu/data:
113 ```
114 docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/roary roary -f /data /data/*.gff
115 ```
116
117 ### Installing from source (advanced Linux users only)
118 As a last resort you can install everything from source. This is for users with advanced Linux skills and we do not provide any support with this method since you have the skills to figure things out.
119 Download the latest software from (https://github.com/sanger-pathogens/Roary/tarball/master).
120
121 Choose somewhere to put it, for example in your home directory (no root access required):
122
123 ```
124 cd $HOME
125 tar zxvf sanger-pathogens-Roary-xxxxxx.tar.gz
126 ls Roary-*
127 ```
128
129 Add the following lines to your $HOME/.bashrc file, or to /etc/profile.d/roary.sh to make it available to all users:
130
131 ```
132 export PATH=$PATH:$HOME/Roary-x.x.x/bin
133 export PERL5LIB=$PERL5LIB:$HOME/Roary-x.x.x/lib
134 ```
135 Install the Perl dependencies:
136
137 ```
138 sudo cpanm Array::Utils Bio::Perl Exception::Class File::Basename File::Copy File::Find::Rule File::Grep File::Path File::Slurper File::Spec File::Temp File::Which FindBin Getopt::Long Graph Graph::Writer::Dot List::Util Log::Log4perl Moose Moose::Role Text::CSV PerlIO::utf8_strict Devel::OverloadInfo Digest::MD5::File
139 ```
140 Install the external dependances either from source or from your packaging system:
141 ```
142 bedtools cd-hit blast mcl GNUparallel prank mafft fasttree
143 ```
144
145 ### Ancient systems and versions of perl
146 The code will not work with perl 5.8 or below (pre-modern perl). We no longer test against 5.10 (released 2007) or 5.12 (released 2010). If you're running a very old verison of Linux, you're also in trouble.
147
148 ### Running the tests
149 The test can be run with dzil from the top level directory:
150
151 ```
152 dzil test
153 ```
154
155 ### Versions of software we test against
156 * Perl 5.14, 5.26
157 * cdhit 4.6.8
158 * ncbi blast+ 2.6.0
159 * mcl 14-137
160 * bedtools 2.27.1
161 * prank 140603
162 * GNU parallel 20170822, 20160722
163 * FastTree 2.1.9
164
165 ## Usage
166 ```
167 Usage: roary [options] *.gff
168
169 Options: -p INT number of threads [1]
170 -o STR clusters output filename [clustered_proteins]
171 -f STR output directory [.]
172 -e create a multiFASTA alignment of core genes using PRANK
173 -n fast core gene alignment with MAFFT, use with -e
174 -i minimum percentage identity for blastp [95]
175 -cd FLOAT percentage of isolates a gene must be in to be core [99]
176 -qc generate QC report with Kraken
177 -k STR path to Kraken database for QC, use with -qc
178 -a check dependancies and print versions
179 -b STR blastp executable [blastp]
180 -c STR mcl executable [mcl]
181 -d STR mcxdeblast executable [mcxdeblast]
182 -g INT maximum number of clusters [50000]
183 -m STR makeblastdb executable [makeblastdb]
184 -r create R plots, requires R and ggplot2
185 -s dont split paralogs
186 -t INT translation table [11]
187 -ap allow paralogs in core alignment
188 -z dont delete intermediate files
189 -v verbose output to STDOUT
190 -w print version and exit
191 -y add gene inference information to spreadsheet, doesnt work with -e
192 -iv STR Change the MCL inflation value [1.5]
193 -h this help message
194
195 Example: Quickly generate a core gene alignment using 8 threads
196 roary -e --mafft -p 8 *.gff
197
198 For further info see: http://sanger-pathogens.github.io/Roary/
199 ```
200 For further instructions on how to use the software, the input format and output formats, please see [the Roary website](http://sanger-pathogens.github.io/Roary).
201
202 ## License
203 Roary is free software, licensed under [GPLv3](https://github.com/sanger-pathogens/Roary/blob/master/GPL-LICENSE).
204
205 ## Feedback/Issues
206 We currently do not have the resources to provide support for Roary. However, the community might be able to help you out if you report any issues about usage of the software to the [issues page](https://github.com/sanger-pathogens/Roary/issues).
207
208 ## Citation
209 If you use this software please cite:
210
211 "Roary: Rapid large-scale prokaryote pan genome analysis",
212 Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
213 Bioinformatics, (2015). doi: http://dx.doi.org/10.1093/bioinformatics/btv421
214 [Roary: Rapid large-scale prokaryote pan genome analysis](http://dx.doi.org/10.1093/bioinformatics/btv421)
215
216 ## Further Information
217 For more information on this software see:
218 * [The Roary website](http://sanger-pathogens.github.io/Roary)
219 * [The Jupyter notebook tutorial](https://github.com/sanger-pathogens/pathogen-informatics-training)