What it does
Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.
This tool allows the following trimming steps to be performed:
- ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read
- If Always keep both reads (PE specific/palindrome mode) is True, the reverse read will also be retained in palindrome mode. After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read. Retaining the reverse read may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads.
- SLIDINGWINDOW: Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold
- MINLEN: Drop the read if it is below a specified length
- LEADING: Cut bases off the start of a read, if below a threshold quality
- TRAILING: Cut bases off the end of a read, if below a threshold quality
- CROP: Cut the read to a specified length
- HEADCROP: Cut the specified number of bases from the start of the read
- AVGQUAL: Drop the read if the average quality is below a specified value
- MAXINFO: Trim reads adaptively, balancing read length and error rate to maximise the value of each read
If ILLUMINACLIP is requested then it is always performed first; subsequent options can be mixed and matched and will be performed in the order that they have been specified.
Note that trimming operation order is important.
For single-end data this Trimmomatic tool accepts a single FASTQ file; for paired-end data it will accept either two FASTQ files (R1 and R2), or a dataset collection containing the R1/R2 FASTQ pair.
For paired-end data a particular strength of Trimmomatic is that it retains the pairing of reads (from R1 and R2) in the filtered output files:
- Two FASTQ files (R1-paired and R2-paired) contain one read from each pair where both have survived filtering.
- Additionally two FASTQ files (R1-unpaired and R2-unpaired) contain reads where one of the pair failed the filtering steps.
If the input consists of a dataset collection with the R1/R2 FASTQ pair then the outputs will also inclue two dataset collections: one for the 'paired' outputs and one for the 'unpaired' (as described above)
Retaining the same order and number of reads in the filtered output fastq files is essential for many downstream analysis tools.
For single-end data the output is a single FASTQ file containing just the filtered reads.
This Galaxy tool has been developed within the Bioinformatics Core Facility at the University of Manchester, with contributions from Peter van Heusden, Marius van den Beek, Jelle Scholtalbers, Charles Girardot, and Matthias Bernt.
It runs the Trimmomatic program which has been developed within Bjorn Usadel's group at RWTH Aachen university.
Trimmomatic website (including documentation):
The reference for Trimmomatic is:
- Bolger, A.M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170.
Please kindly acknowledge both this Galaxy tool and the Trimmomatic program if you use it.