Mercurial > repos > aaronpetkau > flash
comparison README @ 2:6889442b27dc draft default tip
Uploaded
| author | aaronpetkau |
|---|---|
| date | Sat, 04 Jul 2015 08:58:21 -0400 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 1:a444685f161c | 2:6889442b27dc |
|---|---|
| 1 Tool wrapper by Brian Yeo | |
| 2 brian.yeo@phac.aspc.gc.ca | |
| 3 | |
| 4 INTRODUCTION | |
| 5 | |
| 6 FLASH (Fast Length Adjustment of SHort reads) is an accurate and fast tool | |
| 7 to merge paired-end reads that were generated from DNA fragments whose | |
| 8 lengths are shorter than twice the length of reads. Merged read pairs result | |
| 9 in unpaired longer reads, which are generally more desired in genome | |
| 10 assembly and genome analysis processes. | |
| 11 | |
| 12 Briefly, the FLASH algorithm considers all possible overlaps at or above a | |
| 13 minimum length between the reads in a pair and chooses the overlap that | |
| 14 results in the lowest mismatch density (proportion of mismatched bases in | |
| 15 the overlapped region). Ties between multiple overlaps are broken by | |
| 16 considering quality scores at mismatch sites. When building the merged | |
| 17 sequence, FLASH computes a consensus sequence in the overlapped region. | |
| 18 More details can be found in the original publication | |
| 19 (http://bioinformatics.oxfordjournals.org/content/27/21/2957.full). | |
| 20 | |
| 21 Limitations of FLASH include: | |
| 22 - FLASH cannot merge paired-end reads that do not overlap. | |
| 23 - FLASH cannot merge read pairs that have an outward orientation, either | |
| 24 due to being "jumping" reads or due to excessive trimming. | |
| 25 - FLASH is not designed for data that has a significant amount of indel | |
| 26 errors (such as Sanger sequencing data). It is best suited for Illumina | |
| 27 data. | |
| 28 | |
| 29 INSTALLATION | |
| 30 | |
| 31 On UNIX-compatible systems, including GNU/Linux and Mac OS X, you must compile | |
| 32 FLASH from source. The only dependency, other than functions that are expected | |
| 33 to be available in the C library, is the zlib data compression library. To | |
| 34 install FLASH, download the tarball, untar it, and compile the code using the | |
| 35 provided Makefile: | |
| 36 | |
| 37 $ tar xzf FLASH-1.2.9.tar.gz | |
| 38 $ cd FLASH-1.2.9 | |
| 39 $ make | |
| 40 | |
| 41 The executable file that is produced is named 'flash'. To run it from the | |
| 42 command line you must copy it to a location on your $PATH variable, or else run | |
| 43 it with a path including a directory, such as "./flash". | |
| 44 | |
| 45 FLASH also runs on Windows, and you can compile it on Windows using MinGW. | |
| 46 However, for convenience you may instead download a standalone Windows binary | |
| 47 from the SourceForge page (https://sourceforge.net/projects/flashpage/). | |
| 48 | |
| 49 USAGE | |
| 50 | |
| 51 Please compile FLASH and run `flash --help' to see command-line usage | |
| 52 information and information about input/output files. | |
| 53 | |
| 54 MULTITHREADING | |
| 55 | |
| 56 By default, FLASH uses multiple threads. There are "combiner" threads that do | |
| 57 the actual read combining, as well as up to 5 threads that are used for I/O (up | |
| 58 to 2 readers, up to 3 writers). The default number of combiner threads is the | |
| 59 number of processors; however, it can be adjusted with the -t option (long | |
| 60 option: --threads). | |
| 61 | |
| 62 When multiple combiner threads are used, the order of the combined and | |
| 63 uncombined reads in the output files will be nondeterministic. If you need to | |
| 64 enforce that the output reads appear in the same order as the input, you must | |
| 65 specify --threads=1. | |
| 66 | |
| 67 PERFORMANCE | |
| 68 | |
| 69 Since the FLASH algorithm considers each read pair independently, FLASH will, by | |
| 70 default, process read pairs in parallel. FLASH v1.2.9 and later also make use | |
| 71 of vector instructions available on modern x86 CPUs. Consequently, FLASH works | |
| 72 quite fast, even with low-cost computing resources. As an example, we ran FLASH | |
| 73 v1.2.9 on a laptop with a dual-core 2.3 GHz AMD x86_64 processor and it | |
| 74 processed one million 101-bp read pairs in 11.6 seconds with the default | |
| 75 parameters. Less than 2 MB of memory was used. Actual timing results will | |
| 76 vary, but they will depend primarily on the number of CPUs available, the speed | |
| 77 of each CPU, and on the I/O speed of reading the input files and writing the | |
| 78 output files. FLASH is designed to be scalable to dozens of processors, | |
| 79 although its speed may be limited by I/O in such cases. | |
| 80 | |
| 81 ACCURACY | |
| 82 | |
| 83 With reads' error rate of 1% or less, FLASH processes over 99% of read pairs | |
| 84 correctly. With error rate of 2%, FLASH processes over 98% of read pairs | |
| 85 correctly when default parameters are used. With more aggressive parameters | |
| 86 (i.e., -x 0.35), FLASH processes over 90% of read pairs correctly even when the | |
| 87 error rate is 5%. | |
| 88 | |
| 89 PUBLICATION | |
| 90 | |
| 91 Title: FLASH: fast length adjustment of short reads to improve genome assemblies | |
| 92 Authors: Tanja Magoč and Steven L. Salzberg | |
| 93 URL: http://bioinformatics.oxfordjournals.org/content/27/21/2957.full | |
| 94 | |
| 95 LICENSE | |
| 96 | |
| 97 FLASH is released under the GNU General Public License Version 3 or later (see | |
| 98 COPYING). | |
| 99 | |
| 100 COMMENTS/QUESTIONS/REQUESTS | |
| 101 | |
| 102 Send an e-mail to flash.comment@gmail.com | |
| 103 | |
| 104 Other versions are available from the SourceForge page: | |
| 105 | |
| 106 https://sourceforge.net/projects/flashpage/ |
