9
|
1 # sickle - A windowed adaptive trimming tool for FASTQ files using quality
|
|
2
|
|
3 ## About
|
|
4
|
|
5 Most modern sequencing technologies produce reads that have
|
|
6 deteriorating quality towards the 3'-end and some towards the 5'-end
|
|
7 as well. Incorrectly called bases in both regions negatively impact
|
|
8 assembles, mapping, and downstream bioinformatics analyses.
|
|
9
|
|
10 Sickle is a tool that uses sliding windows along with quality and
|
|
11 length thresholds to determine when quality is sufficiently low to
|
|
12 trim the 3'-end of reads and also determines when the quality is
|
|
13 sufficiently high enough to trim the 5'-end of reads. It will also
|
|
14 discard reads based upon the length threshold. It takes the quality
|
|
15 values and slides a window across them whose length is 0.1 times the
|
|
16 length of the read. If this length is less than 1, then the window is
|
|
17 set to be equal to the length of the read. Otherwise, the window
|
|
18 slides along the quality values until the average quality in the
|
|
19 window rises above the threshold, at which point the algorithm
|
|
20 determines where within the window the rise occurs and cuts the read
|
|
21 and quality there for the 5'-end cut. Then when the average quality
|
|
22 in the window drops below the threshold, the algorithm determines
|
|
23 where in the window the drop occurs and cuts both the read and quality
|
|
24 strings there for the 3'-end cut. However, if the length of the
|
|
25 remaining sequence is less than the minimum length threshold, then the
|
|
26 read is discarded entirely (or replaced with an "N" record). 5'-end
|
|
27 trimming can be disabled.
|
|
28
|
|
29 Sickle supports three types of quality values: Illumina, Solexa, and
|
|
30 Sanger. Note that the Solexa quality setting is an approximation (the
|
|
31 actual conversion is a non-linear transformation). The end
|
|
32 approximation is close. Illumina quality refers to qualities encoded
|
|
33 with the CASAVA pipeline between versions 1.3 and 1.7. Illumina
|
|
34 quality using CASAVA >= 1.8 is Sanger encoded.
|
|
35
|
|
36 Note that Sickle will remove the 2nd fastq record header (on the "+"
|
|
37 line) and replace it with simply a "+". This is the default format for
|
|
38 CASAVA >= 1.8.
|
|
39
|
|
40 Sickle also supports gzipped file inputs and optional gzipped outputs. By default,
|
|
41 Sickle will produce regular (i.e. not gzipped) output, regardless of the input.
|
|
42 Sickle also has an option to truncate reads with Ns at the first N position.
|
|
43
|
|
44 There is also a sickle.xml file included in the package that can be used to add sickle to your
|
|
45 local [Galaxy](http://galaxy.psu.edu/) server.
|
|
46
|
|
47 ## Citation
|
|
48 Sickle doesn't have a paper, but you can cite it like this:
|
|
49
|
|
50 Joshi NA, Fass JN. (2011). Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files
|
|
51 (Version 1.33) [Software]. Available at https://github.com/najoshi/sickle.
|
|
52
|
|
53 ## Requirements
|
|
54
|
|
55 Sickle requires a C compiler; GCC or clang are recommended. Sickle
|
|
56 relies on Heng Li's kseq.h, which is bundled with the source.
|
|
57
|
|
58 Sickle also requires Zlib, which can be obtained at
|
|
59 <http://www.zlib.net/>.
|
|
60
|
|
61 ## Building and Installing Sickle
|
|
62
|
|
63 To build Sickle, enter:
|
|
64
|
|
65 make
|
|
66
|
|
67 Then, copy or move "sickle" to a directory in your $PATH.
|
|
68
|
|
69 ## Usage
|
|
70
|
|
71 Sickle has two modes to work with both paired-end and single-end
|
|
72 reads: `sickle se` and `sickle pe`.
|
|
73
|
|
74 Running sickle by itself will print the help:
|
|
75
|
|
76 sickle
|
|
77
|
|
78 Running sickle with either the "se" or "pe" commands will give help
|
|
79 specific to those commands:
|
|
80
|
|
81 sickle se
|
|
82 sickle pe
|
|
83
|
|
84 ### Sickle Single End (`sickle se`)
|
|
85
|
|
86 `sickle se` takes an input fastq file and outputs a trimmed version of
|
|
87 that file. It also has options to change the length and quality
|
|
88 thresholds for trimming, as well as disabling 5'-trimming and enabling
|
|
89 truncation of sequences with Ns.
|
|
90
|
|
91 #### Examples
|
|
92
|
|
93 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq
|
|
94 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -q 33 -l 40
|
|
95 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -x -n
|
|
96 sickle se -t sanger -g -f input_file.fastq -o trimmed_output_file.fastq.gz
|
|
97 sickle se --fastq-file input_file.fastq --qual-type sanger --output-file trimmed_output_file.fastq
|
|
98
|
|
99 ### Sickle Paired End (`sickle pe`)
|
|
100
|
|
101 `sickle pe` can operate with two types of input. First, it can take
|
|
102 two paired-end files as input and outputs two trimmed paired-end files
|
|
103 as well as a "singles" file. The second form starts with a single
|
|
104 combined input file of reads where you have already interleaved the
|
|
105 reads from the sequencer. In this form, you also supply a single
|
|
106 output file name as well as a "singles" file. The "singles" file
|
|
107 contains reads that passed filter in either the forward or reverse
|
|
108 direction, but not the other. Finally, there is an option (-M) to only
|
|
109 produce one interleaved output file where any reads that did not pass
|
|
110 filter will be output as a FastQ record with a single "N" (whose quality
|
|
111 value is the lowest possible based upon the quality type), thus
|
|
112 preserving the paired nature of the data. You can also change the length
|
|
113 and quality thresholds for trimming, as well as disable 5'-trimming and
|
|
114 enable truncation of sequences with Ns.
|
|
115
|
|
116 #### Examples
|
|
117
|
|
118 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
|
|
119 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
|
|
120 -s trimmed_singles_file.fastq
|
|
121
|
|
122 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
|
|
123 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
|
|
124 -s trimmed_singles_file.fastq -q 12 -l 15
|
|
125
|
|
126 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
|
|
127 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
|
|
128 -s trimmed_singles_file.fastq -n
|
|
129
|
|
130 sickle pe -c combo.fastq -t sanger -m combo_trimmed.fastq \
|
|
131 -s trimmed_singles_file.fastq -n
|
|
132
|
|
133 sickle pe -t sanger -g -f input_file1.fastq -r input_file2.fastq \
|
|
134 -o trimmed_output_file1.fastq.gz -p trimmed_output_file2.fastq.gz \
|
|
135 -s trimmed_singles_file.fastq.gz
|
|
136
|
|
137 sickle pe -c combo.fastq -t sanger -M combo_trimmed_all.fastq
|
|
138
|
|
139 sickle pe --pe-file1 input_file1.fastq --pe-file2 input_file2.fastq --qual-type sanger \
|
|
140 --output-pe1 trimmed_output_file1.fastq --output-pe2 trimmed_output_file2.fastq \
|
|
141 --output-single trimmed_singles_file.fastq
|