comparison README @ 2:6889442b27dc draft default tip

Uploaded
author aaronpetkau
date Sat, 04 Jul 2015 08:58:21 -0400
parents
children
comparison
equal deleted inserted replaced
1:a444685f161c 2:6889442b27dc
1 Tool wrapper by Brian Yeo
2 brian.yeo@phac.aspc.gc.ca
3
4 INTRODUCTION
5
6 FLASH (Fast Length Adjustment of SHort reads) is an accurate and fast tool
7 to merge paired-end reads that were generated from DNA fragments whose
8 lengths are shorter than twice the length of reads. Merged read pairs result
9 in unpaired longer reads, which are generally more desired in genome
10 assembly and genome analysis processes.
11
12 Briefly, the FLASH algorithm considers all possible overlaps at or above a
13 minimum length between the reads in a pair and chooses the overlap that
14 results in the lowest mismatch density (proportion of mismatched bases in
15 the overlapped region). Ties between multiple overlaps are broken by
16 considering quality scores at mismatch sites. When building the merged
17 sequence, FLASH computes a consensus sequence in the overlapped region.
18 More details can be found in the original publication
19 (http://bioinformatics.oxfordjournals.org/content/27/21/2957.full).
20
21 Limitations of FLASH include:
22 - FLASH cannot merge paired-end reads that do not overlap.
23 - FLASH cannot merge read pairs that have an outward orientation, either
24 due to being "jumping" reads or due to excessive trimming.
25 - FLASH is not designed for data that has a significant amount of indel
26 errors (such as Sanger sequencing data). It is best suited for Illumina
27 data.
28
29 INSTALLATION
30
31 On UNIX-compatible systems, including GNU/Linux and Mac OS X, you must compile
32 FLASH from source. The only dependency, other than functions that are expected
33 to be available in the C library, is the zlib data compression library. To
34 install FLASH, download the tarball, untar it, and compile the code using the
35 provided Makefile:
36
37 $ tar xzf FLASH-1.2.9.tar.gz
38 $ cd FLASH-1.2.9
39 $ make
40
41 The executable file that is produced is named 'flash'. To run it from the
42 command line you must copy it to a location on your $PATH variable, or else run
43 it with a path including a directory, such as "./flash".
44
45 FLASH also runs on Windows, and you can compile it on Windows using MinGW.
46 However, for convenience you may instead download a standalone Windows binary
47 from the SourceForge page (https://sourceforge.net/projects/flashpage/).
48
49 USAGE
50
51 Please compile FLASH and run `flash --help' to see command-line usage
52 information and information about input/output files.
53
54 MULTITHREADING
55
56 By default, FLASH uses multiple threads. There are "combiner" threads that do
57 the actual read combining, as well as up to 5 threads that are used for I/O (up
58 to 2 readers, up to 3 writers). The default number of combiner threads is the
59 number of processors; however, it can be adjusted with the -t option (long
60 option: --threads).
61
62 When multiple combiner threads are used, the order of the combined and
63 uncombined reads in the output files will be nondeterministic. If you need to
64 enforce that the output reads appear in the same order as the input, you must
65 specify --threads=1.
66
67 PERFORMANCE
68
69 Since the FLASH algorithm considers each read pair independently, FLASH will, by
70 default, process read pairs in parallel. FLASH v1.2.9 and later also make use
71 of vector instructions available on modern x86 CPUs. Consequently, FLASH works
72 quite fast, even with low-cost computing resources. As an example, we ran FLASH
73 v1.2.9 on a laptop with a dual-core 2.3 GHz AMD x86_64 processor and it
74 processed one million 101-bp read pairs in 11.6 seconds with the default
75 parameters. Less than 2 MB of memory was used. Actual timing results will
76 vary, but they will depend primarily on the number of CPUs available, the speed
77 of each CPU, and on the I/O speed of reading the input files and writing the
78 output files. FLASH is designed to be scalable to dozens of processors,
79 although its speed may be limited by I/O in such cases.
80
81 ACCURACY
82
83 With reads' error rate of 1% or less, FLASH processes over 99% of read pairs
84 correctly. With error rate of 2%, FLASH processes over 98% of read pairs
85 correctly when default parameters are used. With more aggressive parameters
86 (i.e., -x 0.35), FLASH processes over 90% of read pairs correctly even when the
87 error rate is 5%.
88
89 PUBLICATION
90
91 Title: FLASH: fast length adjustment of short reads to improve genome assemblies
92 Authors: Tanja Magoč and Steven L. Salzberg
93 URL: http://bioinformatics.oxfordjournals.org/content/27/21/2957.full
94
95 LICENSE
96
97 FLASH is released under the GNU General Public License Version 3 or later (see
98 COPYING).
99
100 COMMENTS/QUESTIONS/REQUESTS
101
102 Send an e-mail to flash.comment@gmail.com
103
104 Other versions are available from the SourceForge page:
105
106 https://sourceforge.net/projects/flashpage/