0
|
1 .TH bwa 1 "10 Feburuary 2010" "bwa-0.5.6" "Bioinformatics tools"
|
|
2 .SH NAME
|
|
3 .PP
|
|
4 bwa - Burrows-Wheeler Alignment Tool
|
|
5 .SH SYNOPSIS
|
|
6 .PP
|
|
7 bwa index -a bwtsw database.fasta
|
|
8 .PP
|
|
9 bwa aln database.fasta short_read.fastq > aln_sa.sai
|
|
10 .PP
|
|
11 bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam
|
|
12 .PP
|
|
13 bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam
|
|
14 .PP
|
|
15 bwa bwasw database.fasta long_read.fastq > aln.sam
|
|
16
|
|
17 .SH DESCRIPTION
|
|
18 .PP
|
|
19 BWA is a fast light-weighted tool that aligns relatively short sequences
|
|
20 (queries) to a sequence database (targe), such as the human reference
|
|
21 genome. It implements two different algorithms, both based on
|
|
22 Burrows-Wheeler Transform (BWT). The first algorithm is designed for
|
|
23 short queries up to ~200bp with low error rate (<3%). It does gapped
|
|
24 global alignment w.r.t. queries, supports paired-end reads, and is one
|
|
25 of the fastest short read alignment algorithms to date while also
|
|
26 visiting suboptimal hits. The second algorithm, BWA-SW, is designed for
|
|
27 long reads with more errors. It performs heuristic Smith-Waterman-like
|
|
28 alignment to find high-scoring local hits (and thus chimera). On
|
|
29 low-error short queries, BWA-SW is slower and less accurate than the
|
|
30 first algorithm, but on long queries, it is better.
|
|
31 .PP
|
|
32 For both algorithms, the database file in the FASTA format must be
|
|
33 first indexed with the
|
|
34 .B `index'
|
|
35 command, which typically takes a few hours. The first algorithm is
|
|
36 implemented via the
|
|
37 .B `aln'
|
|
38 command, which finds the suffix array (SA) coordinates of good hits of
|
|
39 each individual read, and the
|
|
40 .B `samse/sampe'
|
|
41 command, which converts SA coordinates to chromosomal coordinate and
|
|
42 pairs reads (for `sampe'). The second algorithm is invoked by the
|
|
43 .B `dbtwsw'
|
|
44 command. It works for single-end reads only.
|
|
45
|
|
46 .SH COMMANDS AND OPTIONS
|
|
47 .TP
|
|
48 .B index
|
|
49 bwa index [-p prefix] [-a algoType] [-c] <in.db.fasta>
|
|
50
|
|
51 Index database sequences in the FASTA format.
|
|
52
|
|
53 .B OPTIONS:
|
|
54 .RS
|
|
55 .TP 10
|
|
56 .B -c
|
|
57 Build color-space index. The input fast should be in nucleotide space.
|
|
58 .TP
|
|
59 .B -p STR
|
|
60 Prefix of the output database [same as db filename]
|
|
61 .TP
|
|
62 .B -a STR
|
|
63 Algorithm for constructing BWT index. Available options are:
|
|
64 .RS
|
|
65 .TP
|
|
66 .B is
|
|
67 IS linear-time algorithm for constructing suffix array. It requires
|
|
68 5.37N memory where N is the size of the database. IS is moderately fast,
|
|
69 but does not work with database larger than 2GB. IS is the default
|
|
70 algorithm due to its simplicity. The current codes for IS algorithm are
|
|
71 reimplemented by Yuta Mori.
|
|
72 .TP
|
|
73 .B bwtsw
|
|
74 Algorithm implemented in BWT-SW. This method works with the whole human
|
|
75 genome, but it does not work with database smaller than 10MB and it is
|
|
76 usually slower than IS.
|
|
77 .RE
|
|
78 .RE
|
|
79
|
|
80 .TP
|
|
81 .B aln
|
|
82 bwa aln [-n maxDiff] [-o maxGapO] [-e maxGapE] [-d nDelTail] [-i
|
|
83 nIndelEnd] [-k maxSeedDiff] [-l seedLen] [-t nThrds] [-cRN] [-M misMsc]
|
|
84 [-O gapOsc] [-E gapEsc] [-q trimQual] <in.db.fasta> <in.query.fq> >
|
|
85 <out.sai>
|
|
86
|
|
87 Find the SA coordinates of the input reads. Maximum
|
|
88 .I maxSeedDiff
|
|
89 differences are allowed in the first
|
|
90 .I seedLen
|
|
91 subsequence and maximum
|
|
92 .I maxDiff
|
|
93 differences are allowed in the whole sequence.
|
|
94
|
|
95 .B OPTIONS:
|
|
96 .RS
|
|
97 .TP 10
|
|
98 .B -n NUM
|
|
99 Maximum edit distance if the value is INT, or the fraction of missing
|
|
100 alignments given 2% uniform base error rate if FLOAT. In the latter
|
|
101 case, the maximum edit distance is automatically chosen for different
|
|
102 read lengths. [0.04]
|
|
103 .TP
|
|
104 .B -o INT
|
|
105 Maximum number of gap opens [1]
|
|
106 .TP
|
|
107 .B -e INT
|
|
108 Maximum number of gap extensions, -1 for k-difference mode (disallowing
|
|
109 long gaps) [-1]
|
|
110 .TP
|
|
111 .B -d INT
|
|
112 Disallow a long deletion within INT bp towards the 3'-end [16]
|
|
113 .TP
|
|
114 .B -i INT
|
|
115 Disallow an indel within INT bp towards the ends [5]
|
|
116 .TP
|
|
117 .B -l INT
|
|
118 Take the first INT subsequence as seed. If INT is larger than the query
|
|
119 sequence, seeding will be disabled. For long reads, this option is
|
|
120 typically ranged from 25 to 35 for `-k 2'. [inf]
|
|
121 .TP
|
|
122 .B -k INT
|
|
123 Maximum edit distance in the seed [2]
|
|
124 .TP
|
|
125 .B -t INT
|
|
126 Number of threads (multi-threading mode) [1]
|
|
127 .TP
|
|
128 .B -M INT
|
|
129 Mismatch penalty. BWA will not search for suboptimal hits with a score
|
|
130 lower than (bestScore-misMsc). [3]
|
|
131 .TP
|
|
132 .B -O INT
|
|
133 Gap open penalty [11]
|
|
134 .TP
|
|
135 .B -E INT
|
|
136 Gap extension penalty [4]
|
|
137 .TP
|
|
138 .B -R INT
|
|
139 Proceed with suboptimal alignments if there are no more than INT equally
|
|
140 best hits. This option only affects paired-end mapping. Increasing this
|
|
141 threshold helps to improve the pairing accuracy at the cost of speed,
|
|
142 especially for short reads (~32bp).
|
|
143 .TP
|
|
144 .B -c
|
|
145 Reverse query but not complement it, which is required for alignment in
|
|
146 the color space.
|
|
147 .TP
|
|
148 .B -N
|
|
149 Disable iterative search. All hits with no more than
|
|
150 .I maxDiff
|
|
151 differences will be found. This mode is much slower than the default.
|
|
152 .TP
|
|
153 .B -q INT
|
|
154 Parameter for read trimming. BWA trims a read down to
|
|
155 argmax_x{\\sum_{i=x+1}^l(INT-q_i)} if q_l<INT where l is the original
|
|
156 read length. [0]
|
|
157 .RE
|
|
158
|
|
159 .TP
|
|
160 .B samse
|
|
161 bwa samse [-n maxOcc] <in.db.fasta> <in.sai> <in.fq> > <out.sam>
|
|
162
|
|
163 Generate alignments in the SAM format given single-end reads. Repetitive
|
|
164 hits will be randomly chosen.
|
|
165
|
|
166 .B OPTIONS:
|
|
167 .RS
|
|
168 .TP 10
|
|
169 .B -n INT
|
|
170 Maximum number of alignments to output in the XA tag for reads paired
|
|
171 properly. If a read has more than INT hits, the XA tag will not be
|
|
172 written. [3]
|
|
173 .RE
|
|
174
|
|
175 .TP
|
|
176 .B sampe
|
|
177 bwa sampe [-a maxInsSize] [-o maxOcc] [-n maxHitPaired] [-N maxHitDis]
|
|
178 [-P] <in.db.fasta> <in1.sai> <in2.sai> <in1.fq> <in2.fq> > <out.sam>
|
|
179
|
|
180 Generate alignments in the SAM format given paired-end reads. Repetitive
|
|
181 read pairs will be placed randomly.
|
|
182
|
|
183 .B OPTIONS:
|
|
184 .RS
|
|
185 .TP 8
|
|
186 .B -a INT
|
|
187 Maximum insert size for a read pair to be considered being mapped
|
|
188 properly. Since 0.4.5, this option is only used when there are not
|
|
189 enough good alignment to infer the distribution of insert sizes. [500]
|
|
190 .TP
|
|
191 .B -o INT
|
|
192 Maximum occurrences of a read for pairing. A read with more occurrneces
|
|
193 will be treated as a single-end read. Reducing this parameter helps
|
|
194 faster pairing. [100000]
|
|
195 .TP
|
|
196 .B -P
|
|
197 Load the entire FM-index into memory to reduce disk operations
|
|
198 (base-space reads only). With this option, at least 1.25N bytes of
|
|
199 memory are required, where N is the length of the genome.
|
|
200 .TP
|
|
201 .B -n INT
|
|
202 Maximum number of alignments to output in the XA tag for reads paired
|
|
203 properly. If a read has more than INT hits, the XA tag will not be
|
|
204 written. [3]
|
|
205 .TP
|
|
206 .B -N INT
|
|
207 Maximum number of alignments to output in the XA tag for disconcordant
|
|
208 read pairs (excluding singletons). If a read has more than INT hits, the
|
|
209 XA tag will not be written. [10]
|
|
210 .RE
|
|
211
|
|
212 .TP
|
|
213 .B bwasw
|
|
214 bwa bwasw [-a matchScore] [-b mmPen] [-q gapOpenPen] [-r gapExtPen] [-t
|
|
215 nThreads] [-w bandWidth] [-T thres] [-s hspIntv] [-z zBest] [-N
|
|
216 nHspRev] [-c thresCoef] <in.db.fasta> <in.fq>
|
|
217
|
|
218 Align query sequences in the <in.fq> file.
|
|
219
|
|
220 .B OPTIONS:
|
|
221 .RS
|
|
222 .TP 10
|
|
223 .B -a INT
|
|
224 Score of a match [1]
|
|
225 .TP
|
|
226 .B -b INT
|
|
227 Mismatch penalty [3]
|
|
228 .TP
|
|
229 .B -q INT
|
|
230 Gap open penalty [5]
|
|
231 .TP
|
|
232 .B -r INT
|
|
233 Gap extension penalty. The penalty for a contiguous gap of size k is
|
|
234 q+k*r. [2]
|
|
235 .TP
|
|
236 .B -t INT
|
|
237 Number of threads in the multi-threading mode [1]
|
|
238 .TP
|
|
239 .B -w INT
|
|
240 Band width in the banded alignment [33]
|
|
241 .TP
|
|
242 .B -T INT
|
|
243 Minimum score threshold divided by a [37]
|
|
244 .TP
|
|
245 .B -c FLOAT
|
|
246 Coefficient for threshold adjustment according to query length. Given an
|
|
247 l-long query, the threshold for a hit to be retained is
|
|
248 a*max{T,c*log(l)}. [5.5]
|
|
249 .TP
|
|
250 .B -z INT
|
|
251 Z-best heuristics. Higher -z increases accuracy at the cost of speed. [1]
|
|
252 .TP
|
|
253 .B -s INT
|
|
254 Maximum SA interval size for initiating a seed. Higher -s increases
|
|
255 accuracy at the cost of speed. [3]
|
|
256 .TP
|
|
257 .B -N INT
|
|
258 Minimum number of seeds supporting the resultant alignment to skip
|
|
259 reverse alignment. [5]
|
|
260 .RE
|
|
261
|
|
262 .SH SAM ALIGNMENT FORMAT
|
|
263 .PP
|
|
264 The output of the
|
|
265 .B `aln'
|
|
266 command is binary and designed for BWA use only. BWA outputs the final
|
|
267 alignment in the SAM (Sequence Alignment/Map) format. Each line consists
|
|
268 of:
|
|
269
|
|
270 .TS
|
|
271 center box;
|
|
272 cb | cb | cb
|
|
273 n | l | l .
|
|
274 Col Field Description
|
|
275 _
|
|
276 1 QNAME Query (pair) NAME
|
|
277 2 FLAG bitwise FLAG
|
|
278 3 RNAME Reference sequence NAME
|
|
279 4 POS 1-based leftmost POSition/coordinate of clipped sequence
|
|
280 5 MAPQ MAPping Quality (Phred-scaled)
|
|
281 6 CIAGR extended CIGAR string
|
|
282 7 MRNM Mate Reference sequence NaMe (`=' if same as RNAME)
|
|
283 8 MPOS 1-based Mate POSistion
|
|
284 9 ISIZE Inferred insert SIZE
|
|
285 10 SEQ query SEQuence on the same strand as the reference
|
|
286 11 QUAL query QUALity (ASCII-33 gives the Phred base quality)
|
|
287 12 OPT variable OPTional fields in the format TAG:VTYPE:VALUE
|
|
288 .TE
|
|
289
|
|
290 .PP
|
|
291 Each bit in the FLAG field is defined as:
|
|
292
|
|
293 .TS
|
|
294 center box;
|
|
295 cb | cb | cb
|
|
296 c | l | l .
|
|
297 Chr Flag Description
|
|
298 _
|
|
299 p 0x0001 the read is paired in sequencing
|
|
300 P 0x0002 the read is mapped in a proper pair
|
|
301 u 0x0004 the query sequence itself is unmapped
|
|
302 U 0x0008 the mate is unmapped
|
|
303 r 0x0010 strand of the query (1 for reverse)
|
|
304 R 0x0020 strand of the mate
|
|
305 1 0x0040 the read is the first read in a pair
|
|
306 2 0x0080 the read is the second read in a pair
|
|
307 s 0x0100 the alignment is not primary
|
|
308 f 0x0200 QC failure
|
|
309 d 0x0400 optical or PCR duplicate
|
|
310 .TE
|
|
311
|
|
312 .PP
|
|
313 The Please check <http://samtools.sourceforge.net> for the format
|
|
314 specification and the tools for post-processing the alignment.
|
|
315
|
|
316 BWA generates the following optional fields. Tags starting with `X' are
|
|
317 specific to BWA.
|
|
318
|
|
319 .TS
|
|
320 center box;
|
|
321 cb | cb
|
|
322 cB | l .
|
|
323 Tag Meaning
|
|
324 _
|
|
325 NM Edit distance
|
|
326 MD Mismatching positions/bases
|
|
327 AS Alignment score
|
|
328 _
|
|
329 X0 Number of best hits
|
|
330 X1 Number of suboptimal hits found by BWA
|
|
331 XN Number of ambiguous bases in the referenece
|
|
332 XM Number of mismatches in the alignment
|
|
333 XO Number of gap opens
|
|
334 XG Number of gap extentions
|
|
335 XT Type: Unique/Repeat/N/Mate-sw
|
|
336 XA Alternative hits; format: (chr,pos,CIGAR,NM;)*
|
|
337 _
|
|
338 XS Suboptimal alignment score
|
|
339 XF Support from forward/reverse alignment
|
|
340 XE Number of supporting seeds
|
|
341 .TE
|
|
342
|
|
343 .PP
|
|
344 Note that XO and XG are generated by BWT search while the CIGAR string
|
|
345 by Smith-Waterman alignment. These two tags may be inconsistent with the
|
|
346 CIGAR string. This is not a bug.
|
|
347
|
|
348 .SH NOTES ON SHORT-READ ALIGNMENT
|
|
349 .SS Alignment Accuracy
|
|
350 .PP
|
|
351 When seeding is disabled, BWA guarantees to find an alignment
|
|
352 containing maximum
|
|
353 .I maxDiff
|
|
354 differences including
|
|
355 .I maxGapO
|
|
356 gap opens which do not occur within
|
|
357 .I nIndelEnd
|
|
358 bp towards either end of the query. Longer gaps may be found if
|
|
359 .I maxGapE
|
|
360 is positive, but it is not guaranteed to find all hits. When seeding is
|
|
361 enabled, BWA further requires that the first
|
|
362 .I seedLen
|
|
363 subsequence contains no more than
|
|
364 .I maxSeedDiff
|
|
365 differences.
|
|
366 .PP
|
|
367 When gapped alignment is disabled, BWA is expected to generate the same
|
|
368 alignment as Eland, the Illumina alignment program. However, as BWA
|
|
369 change `N' in the database sequence to random nucleotides, hits to these
|
|
370 random sequences will also be counted. As a consequence, BWA may mark a
|
|
371 unique hit as a repeat, if the random sequences happen to be identical
|
|
372 to the sequences which should be unqiue in the database. This random
|
|
373 behaviour will be avoided in future releases.
|
|
374 .PP
|
|
375 By default, if the best hit is no so repetitive (controlled by -R), BWA
|
|
376 also finds all hits contains one more mismatch; otherwise, BWA finds all
|
|
377 equally best hits only. Base quality is NOT considered in evaluating
|
|
378 hits. In paired-end alignment, BWA pairs all hits it found. It further
|
|
379 performs Smith-Waterman alignment for unmapped reads with mates mapped
|
|
380 to rescue mapped mates, and for high-quality anomalous pairs to fix
|
|
381 potential alignment errors.
|
|
382
|
|
383 .SS Estimating Insert Size Distribution
|
|
384 .PP
|
|
385 BWA estimates the insert size distribution per 256*1024 read pairs. It
|
|
386 first collects pairs of reads with both ends mapped with a single-end
|
|
387 quality 20 or higher and then calculates median (Q2), lower and higher
|
|
388 quartile (Q1 and Q3). It estimates the mean and the variance of the
|
|
389 insert size distribution from pairs whose insert sizes are within
|
|
390 interval [Q1-2(Q3-Q1), Q3+2(Q3-Q1)]. The maximum distance x for a pair
|
|
391 considered to be properly paired (SAM flag 0x2) is calculated by solving
|
|
392 equation Phi((x-mu)/sigma)=x/L*p0, where mu is the mean, sigma is the
|
|
393 standard error of the insert size distribution, L is the length of the
|
|
394 genome, p0 is prior of anomalous pair and Phi() is the standard
|
|
395 cumulative distribution function. For mapping Illumina short-insert
|
|
396 reads to the human genome, x is about 6-7 sigma away from the
|
|
397 mean. Quartiles, mean, variance and x will be printed to the standard
|
|
398 error output.
|
|
399
|
|
400 .SS Memory Requirement
|
|
401 .PP
|
|
402 With bwtsw algorithm, 2.5GB memory is required for indexing the complete
|
|
403 human genome sequences. For short reads, the
|
|
404 .B `aln'
|
|
405 command uses ~2.3GB memory and the
|
|
406 .B `sampe'
|
|
407 command uses ~3.5GB.
|
|
408
|
|
409 .SS Speed
|
|
410 .PP
|
|
411 Indexing the human genome sequences takes 3 hours with bwtsw
|
|
412 algorithm. Indexing smaller genomes with IS or divsufsort algorithms is
|
|
413 several times faster, but requires more memory.
|
|
414 .PP
|
|
415 Speed of alignment is largely determined by the error rate of the query
|
|
416 sequences (r). Firstly, BWA runs much faster for near perfect hits than
|
|
417 for hits with many differences, and it stops searching for a hit with
|
|
418 l+2 differences if a l-difference hit is found. This means BWA will be
|
|
419 very slow if r is high because in this case BWA has to visit hits with
|
|
420 many differences and looking for these hits is expensive. Secondly, the
|
|
421 alignment algorithm behind makes the speed sensitive to [k log(N)/m],
|
|
422 where k is the maximum allowed differences, N the size of database and m
|
|
423 the length of a query. In practice, we choose k w.r.t. r and therefore r
|
|
424 is the leading factor. I would not recommend to use BWA on data with
|
|
425 r>0.02.
|
|
426 .PP
|
|
427 Pairing is slower for shorter reads. This is mainly because shorter
|
|
428 reads have more spurious hits and converting SA coordinates to
|
|
429 chromosomal coordinates are very costly.
|
|
430 .PP
|
|
431 In a practical experiment, BWA is able to map 2 million 32bp reads to a
|
|
432 bacterial genome in several minutes, map the same amount of reads to
|
|
433 human X chromosome in 8-15 minutes and to the human genome in 15-25
|
|
434 minutes. This result implies that the speed of BWA is insensitive to the
|
|
435 size of database and therefore BWA is more efficient when the database
|
|
436 is sufficiently large. On smaller genomes, hash based algorithms are
|
|
437 usually much faster.
|
|
438
|
|
439 .SH NOTES ON LONG-READ ALIGNMENT
|
|
440 .PP
|
|
441 Command
|
|
442 .B `bwasw'
|
|
443 is designed for long-read alignment. The algorithm behind, BWA-SW, is
|
|
444 similar to BWT-SW, but does not guarantee to find all local hits due to
|
|
445 the heuristic acceleration. It tends to be faster and more accurate if
|
|
446 the resultant alignment is supported by more seeds, and therefore
|
|
447 BWA-SW usually performs better on long queries than on short ones.
|
|
448
|
|
449 On 350-1000bp reads, BWA-SW is several to tens of times faster than the
|
|
450 existing programs. Its accuracy is comparable to SSAHA2, more accurate
|
|
451 than BLAT. Like BLAT, BWA-SW also finds chimera which may pose a
|
|
452 challenge to SSAHA2. On 10-100kbp queries where chimera detection is
|
|
453 important, BWA-SW is over 10X faster than BLAT while being more
|
|
454 sensitive.
|
|
455
|
|
456 BWA-SW can also be used to align ~100bp reads, but it is slower than
|
|
457 the short-read algorithm. Its sensitivity and accuracy is lower than
|
|
458 SSAHA2 especially when the sequencing error rate is above 2%. This is
|
|
459 the trade-off of the 30X speed up in comparison to SSAHA2's -454 mode.
|
|
460
|
|
461 .SH SEE ALSO
|
|
462 BWA website <http://bio-bwa.sourceforge.net>, Samtools website
|
|
463 <http://samtools.sourceforge.net>
|
|
464
|
|
465 .SH AUTHOR
|
|
466 Heng Li at the Sanger Institute wrote the key source codes and
|
|
467 integrated the following codes for BWT construction: bwtsw
|
|
468 <http://i.cs.hku.hk/~ckwong3/bwtsw/>, implemented by Chi-Kwong Wong at
|
|
469 the University of Hong Kong and IS
|
|
470 <http://yuta.256.googlepages.com/sais> originally proposed by Nong Ge
|
|
471 <http://www.cs.sysu.edu.cn/nong/> at the Sun Yat-Sen University and
|
|
472 implemented by Yuta Mori.
|
|
473
|
|
474 .SH LICENSE AND CITATION
|
|
475 .PP
|
|
476 The full BWA package is distributed under GPLv3 as it uses source codes
|
|
477 from BWT-SW which is covered by GPL. Sorting, hash table, BWT and IS
|
|
478 libraries are distributed under the MIT license.
|
|
479 .PP
|
|
480 If you use the short-read alignment component, please cite the following
|
|
481 paper:
|
|
482 .PP
|
|
483 Li H. and Durbin R. (2009) Fast and accurate short read alignment with
|
|
484 Burrows-Wheeler transform. Bioinformatics, 25, 1754-60. [PMID: 19451168]
|
|
485 .PP
|
|
486 If you use the long-read component (BWA-SW), please cite:
|
|
487 .PP
|
|
488 Li H. and Durbin R. (2010) Fast and accurate long-read alignment with
|
|
489 Burrows-Wheeler transform. Bioinformatics. [PMID: 20080505]
|
|
490
|
|
491 .SH HISTORY
|
|
492 BWA is largely influenced by BWT-SW. It uses source codes from BWT-SW
|
|
493 and mimics its binary file formats; BWA-SW resembles BWT-SW in several
|
|
494 ways. The initial idea about BWT-based alignment also came from the
|
|
495 group who developed BWT-SW. At the same time, BWA is different enough
|
|
496 from BWT-SW. The short-read alignment algorithm bears no similarity to
|
|
497 Smith-Waterman algorithm any more. While BWA-SW learns from BWT-SW, it
|
|
498 introduces heuristics that can hardly be applied to the original
|
|
499 algorithm. In all, BWA does not guarantee to find all local hits as what
|
|
500 BWT-SW is designed to do, but it is much faster than BWT-SW on both
|
|
501 short and long query sequences.
|
|
502
|
|
503 I started to write the first piece of codes on 24 May 2008 and got the
|
|
504 initial stable version on 02 June 2008. During this period, I was
|
|
505 acquainted that Professor Tak-Wah Lam, the first author of BWT-SW paper,
|
|
506 was collaborating with Beijing Genomics Institute on SOAP2, the successor
|
|
507 to SOAP (Short Oligonucleotide Analysis Package). SOAP2 has come out in
|
|
508 November 2008. According to the SourceForge download page, the third
|
|
509 BWT-based short read aligner, bowtie, was first released in August
|
|
510 2008. At the time of writing this manual, at least three more BWT-based
|
|
511 short-read aligners are being implemented.
|
|
512
|
|
513 The BWA-SW algorithm is a new component of BWA. It was conceived in
|
|
514 November 2008 and implemented ten months later.
|