Galaxy tool preview

Cutadapt (version 0.9.5.a)
3' Adapters
3' Adapters 0
5' or 3' (Anywhere) Adapters
5' or 3' (Anywhere) Adapters 0
Maximum allowed error rate (no. of errors divided by the length of the matching region).
Try to remove adapters at most COUNT times. Useful when an adapter gets appended multiple times.
Minimum overlap length. If the overlap between the adapter and the sequence is shorter than LENGTH, the read is not modified.
Discard reads that contain the adapter instead of trimming them. Use the 'Minimum overlap length' option in order to avoid throwing away too many randomly matching reads!
Discard trimmed reads that are shorter than LENGTH. Reads that are too short even before adapter removal are also discarded. In colorspace, an initial primer is not counted. Value of 0 means no minimum length.
Discard trimmed reads that are longer than LENGTH. Reads that are too long even before adapter removal are also discarded. In colorspace, an initial primer is not counted. Value of 0 means no maximum length.
Trim low-quality ends from reads before adapter removal. The algorithm is the same as the one used by BWA (Subtract CUTOFF from all qualities; compute partial sums from all indices to the end of the sequence; cut sequence at the index at which the sum is minimal). Value of 0 means no quality trimming.
By default all reads will be put in the same file. However, reads with adapters matching in the middle, unmatched reads, and too-short reads can be saved in separate files.

This tool removes adapter sequences from DNA high-throughput sequencing data. This is usually necessary when the read length of the machine is longer than the molecule that is sequenced, such as in microRNA data.

The tool is based on the opensource cutadapt tool.


Algorithm

cutadapt uses a simple semi-global alignment algorithm, without any special optimizations. For speed, the algorithm is implemented as a Python extension module in calignmodule.c.

Partial adapter matches

Cutadapt correctly deals with partial adapter matches. As an example, suppose your adapter sequence is "ADAPTER" (specified via 3' Adapters parameter). If you have these input sequences:

MYSEQUENCEADAPTER
MYSEQUENCEADAP
MYSEQUENCEADAPTERSOMETHINGELSE

All of them will be trimmed to "MYSEQUENCE". If the sequence starts with an adapter, like this:

ADAPTERSOMETHING

It will be empty after trimming.

When the allowed error rate is sufficiently high, errors in the adapter sequence are allowed. For example, ADABTER (1 mismatch), ADAPTR (1 deletion), and ADAPPTER (1 insertion) will all be recognized if the error rate is set to 0.15.

Allowing adapters anywhere

Cutadapt assumes that any adapter specified via the 3` Adapters parameter was ligated to the 3' end of the sequence. This is the correct assumption for at least the SOLiD and Illumina small RNA protocols and probably others.

If, on the other hand, your adapter can also be ligated to the 5' end (on purpose or by accident), you should tell cutadapt so by using the 5' or 3' (Anywhere) Adapters parameter. It will then use a different alignment algorithm and correctly trim adapters that appear in the beginning of a read. An adapter specified this way will also be found if it appears only partially in the beginning of a read. For example, these sequences

ADAPTERMYSEQUENCE
PTERMYSEQUENCE

will be trimmed to "MYSEQUENCE". Note that the regular algorithm would trim the first read to an empty sequence.

This parameter currently does not work with color space data.