Mercurial > repos > nikhil-joshi > sickle
comparison README.md @ 4:c70137414dcd draft
sickle v1.33
author | nikhil-joshi |
---|---|
date | Wed, 23 Jul 2014 18:35:10 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
3:f6ebdaca9925 | 4:c70137414dcd |
---|---|
1 # sickle - A windowed adaptive trimming tool for FASTQ files using quality | |
2 | |
3 ## About | |
4 | |
5 Most modern sequencing technologies produce reads that have | |
6 deteriorating quality towards the 3'-end and some towards the 5'-end | |
7 as well. Incorrectly called bases in both regions negatively impact | |
8 assembles, mapping, and downstream bioinformatics analyses. | |
9 | |
10 Sickle is a tool that uses sliding windows along with quality and | |
11 length thresholds to determine when quality is sufficiently low to | |
12 trim the 3'-end of reads and also determines when the quality is | |
13 sufficiently high enough to trim the 5'-end of reads. It will also | |
14 discard reads based upon the length threshold. It takes the quality | |
15 values and slides a window across them whose length is 0.1 times the | |
16 length of the read. If this length is less than 1, then the window is | |
17 set to be equal to the length of the read. Otherwise, the window | |
18 slides along the quality values until the average quality in the | |
19 window rises above the threshold, at which point the algorithm | |
20 determines where within the window the rise occurs and cuts the read | |
21 and quality there for the 5'-end cut. Then when the average quality | |
22 in the window drops below the threshold, the algorithm determines | |
23 where in the window the drop occurs and cuts both the read and quality | |
24 strings there for the 3'-end cut. However, if the length of the | |
25 remaining sequence is less than the minimum length threshold, then the | |
26 read is discarded entirely (or replaced with an "N" record). 5'-end | |
27 trimming can be disabled. | |
28 | |
29 Sickle supports three types of quality values: Illumina, Solexa, and | |
30 Sanger. Note that the Solexa quality setting is an approximation (the | |
31 actual conversion is a non-linear transformation). The end | |
32 approximation is close. Illumina quality refers to qualities encoded | |
33 with the CASAVA pipeline between versions 1.3 and 1.7. Illumina | |
34 quality using CASAVA >= 1.8 is Sanger encoded. | |
35 | |
36 Note that Sickle will remove the 2nd fastq record header (on the "+" | |
37 line) and replace it with simply a "+". This is the default format for | |
38 CASAVA >= 1.8. | |
39 | |
40 Sickle also supports gzipped file inputs and optional gzipped outputs. By default, | |
41 Sickle will produce regular (i.e. not gzipped) output, regardless of the input. | |
42 Sickle also has an option to truncate reads with Ns at the first N position. | |
43 | |
44 There is also a sickle.xml file included in the package that can be used to add sickle to your | |
45 local [Galaxy](http://galaxy.psu.edu/) server. | |
46 | |
47 ## Citation | |
48 Sickle doesn't have a paper, but you can cite it like this: | |
49 | |
50 Joshi NA, Fass JN. (2011). Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files | |
51 (Version 1.33) [Software]. Available at https://github.com/najoshi/sickle. | |
52 | |
53 ## Requirements | |
54 | |
55 Sickle requires a C compiler; GCC or clang are recommended. Sickle | |
56 relies on Heng Li's kseq.h, which is bundled with the source. | |
57 | |
58 Sickle also requires Zlib, which can be obtained at | |
59 <http://www.zlib.net/>. | |
60 | |
61 ## Building and Installing Sickle | |
62 | |
63 To build Sickle, enter: | |
64 | |
65 make | |
66 | |
67 Then, copy or move "sickle" to a directory in your $PATH. | |
68 | |
69 ## Usage | |
70 | |
71 Sickle has two modes to work with both paired-end and single-end | |
72 reads: `sickle se` and `sickle pe`. | |
73 | |
74 Running sickle by itself will print the help: | |
75 | |
76 sickle | |
77 | |
78 Running sickle with either the "se" or "pe" commands will give help | |
79 specific to those commands: | |
80 | |
81 sickle se | |
82 sickle pe | |
83 | |
84 ### Sickle Single End (`sickle se`) | |
85 | |
86 `sickle se` takes an input fastq file and outputs a trimmed version of | |
87 that file. It also has options to change the length and quality | |
88 thresholds for trimming, as well as disabling 5'-trimming and enabling | |
89 truncation of sequences with Ns. | |
90 | |
91 #### Examples | |
92 | |
93 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq | |
94 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -q 33 -l 40 | |
95 sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -x -n | |
96 sickle se -t sanger -g -f input_file.fastq -o trimmed_output_file.fastq.gz | |
97 | |
98 ### Sickle Paired End (`sickle pe`) | |
99 | |
100 `sickle pe` can operate with two types of input. First, it can take | |
101 two paired-end files as input and outputs two trimmed paired-end files | |
102 as well as a "singles" file. The second form starts with a single | |
103 combined input file of reads where you have already interleaved the | |
104 reads from the sequencer. In this form, you also supply a single | |
105 output file name as well as a "singles" file. The "singles" file | |
106 contains reads that passed filter in either the forward or reverse | |
107 direction, but not the other. Finally, there is an option (-M) to only | |
108 produce one interleaved output file where any reads that did not pass | |
109 filter will be output as a FastQ record with a single "N" (whose quality | |
110 value is the lowest possible based upon the quality type), thus | |
111 preserving the paired nature of the data. You can also change the length | |
112 and quality thresholds for trimming, as well as disable 5'-trimming and | |
113 enable truncation of sequences with Ns. | |
114 | |
115 #### Examples | |
116 | |
117 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \ | |
118 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \ | |
119 -s trimmed_singles_file.fastq | |
120 | |
121 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \ | |
122 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \ | |
123 -s trimmed_singles_file.fastq -q 12 -l 15 | |
124 | |
125 sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \ | |
126 -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \ | |
127 -s trimmed_singles_file.fastq -n | |
128 | |
129 sickle pe -c combo.fastq -t sanger -m combo_trimmed.fastq \ | |
130 -s trimmed_singles_file.fastq -n | |
131 | |
132 sickle pe -t sanger -g -f input_file1.fastq -r input_file2.fastq \ | |
133 -o trimmed_output_file1.fastq.gz -p trimmed_output_file2.fastq.gz \ | |
134 -s trimmed_singles_file.fastq.gz | |
135 | |
136 sickle pe -c combo.fastq -t sanger -M combo_trimmed_all.fastq |