Mercurial > repos > nikhil-joshi > sickle
diff README.md @ 4:c70137414dcd draft
sickle v1.33
author | nikhil-joshi |
---|---|
date | Wed, 23 Jul 2014 18:35:10 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.md Wed Jul 23 18:35:10 2014 -0400 @@ -0,0 +1,136 @@ +# sickle - A windowed adaptive trimming tool for FASTQ files using quality + +## About + +Most modern sequencing technologies produce reads that have +deteriorating quality towards the 3'-end and some towards the 5'-end +as well. Incorrectly called bases in both regions negatively impact +assembles, mapping, and downstream bioinformatics analyses. + +Sickle is a tool that uses sliding windows along with quality and +length thresholds to determine when quality is sufficiently low to +trim the 3'-end of reads and also determines when the quality is +sufficiently high enough to trim the 5'-end of reads. It will also +discard reads based upon the length threshold. It takes the quality +values and slides a window across them whose length is 0.1 times the +length of the read. If this length is less than 1, then the window is +set to be equal to the length of the read. Otherwise, the window +slides along the quality values until the average quality in the +window rises above the threshold, at which point the algorithm +determines where within the window the rise occurs and cuts the read +and quality there for the 5'-end cut. Then when the average quality +in the window drops below the threshold, the algorithm determines +where in the window the drop occurs and cuts both the read and quality +strings there for the 3'-end cut. However, if the length of the +remaining sequence is less than the minimum length threshold, then the +read is discarded entirely (or replaced with an "N" record). 5'-end +trimming can be disabled. + +Sickle supports three types of quality values: Illumina, Solexa, and +Sanger. Note that the Solexa quality setting is an approximation (the +actual conversion is a non-linear transformation). The end +approximation is close. Illumina quality refers to qualities encoded +with the CASAVA pipeline between versions 1.3 and 1.7. Illumina +quality using CASAVA >= 1.8 is Sanger encoded. + +Note that Sickle will remove the 2nd fastq record header (on the "+" +line) and replace it with simply a "+". This is the default format for +CASAVA >= 1.8. + +Sickle also supports gzipped file inputs and optional gzipped outputs. By default, +Sickle will produce regular (i.e. not gzipped) output, regardless of the input. +Sickle also has an option to truncate reads with Ns at the first N position. + +There is also a sickle.xml file included in the package that can be used to add sickle to your +local [Galaxy](http://galaxy.psu.edu/) server. + +## Citation +Sickle doesn't have a paper, but you can cite it like this: + + Joshi NA, Fass JN. (2011). Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files + (Version 1.33) [Software]. Available at https://github.com/najoshi/sickle. + +## Requirements + +Sickle requires a C compiler; GCC or clang are recommended. Sickle +relies on Heng Li's kseq.h, which is bundled with the source. + +Sickle also requires Zlib, which can be obtained at +<http://www.zlib.net/>. + +## Building and Installing Sickle + +To build Sickle, enter: + + make + +Then, copy or move "sickle" to a directory in your $PATH. + +## Usage + +Sickle has two modes to work with both paired-end and single-end +reads: `sickle se` and `sickle pe`. + +Running sickle by itself will print the help: + + sickle + +Running sickle with either the "se" or "pe" commands will give help +specific to those commands: + + sickle se + sickle pe + +### Sickle Single End (`sickle se`) + +`sickle se` takes an input fastq file and outputs a trimmed version of +that file. It also has options to change the length and quality +thresholds for trimming, as well as disabling 5'-trimming and enabling +truncation of sequences with Ns. + +#### Examples + + sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq + sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -q 33 -l 40 + sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -x -n + sickle se -t sanger -g -f input_file.fastq -o trimmed_output_file.fastq.gz + +### Sickle Paired End (`sickle pe`) + +`sickle pe` can operate with two types of input. First, it can take +two paired-end files as input and outputs two trimmed paired-end files +as well as a "singles" file. The second form starts with a single +combined input file of reads where you have already interleaved the +reads from the sequencer. In this form, you also supply a single +output file name as well as a "singles" file. The "singles" file +contains reads that passed filter in either the forward or reverse +direction, but not the other. Finally, there is an option (-M) to only +produce one interleaved output file where any reads that did not pass +filter will be output as a FastQ record with a single "N" (whose quality +value is the lowest possible based upon the quality type), thus +preserving the paired nature of the data. You can also change the length +and quality thresholds for trimming, as well as disable 5'-trimming and +enable truncation of sequences with Ns. + +#### Examples + + sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \ + -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \ + -s trimmed_singles_file.fastq + + sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \ + -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \ + -s trimmed_singles_file.fastq -q 12 -l 15 + + sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \ + -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \ + -s trimmed_singles_file.fastq -n + + sickle pe -c combo.fastq -t sanger -m combo_trimmed.fastq \ + -s trimmed_singles_file.fastq -n + + sickle pe -t sanger -g -f input_file1.fastq -r input_file2.fastq \ + -o trimmed_output_file1.fastq.gz -p trimmed_output_file2.fastq.gz \ + -s trimmed_singles_file.fastq.gz + + sickle pe -c combo.fastq -t sanger -M combo_trimmed_all.fastq