This tool removes adapter sequences from DNA high-throughput sequencing data. This is usually necessary when the read length of the machine is longer than the molecule that is sequenced, such as in microRNA data.
The tool is based on the opensource cutadapt tool.
cutadapt uses a simple semi-global alignment algorithm, without any special optimizations. For speed, the algorithm is implemented as a Python extension module in calignmodule.c.
Partial adapter matches
Cutadapt correctly deals with partial adapter matches. As an example, suppose your adapter sequence is "ADAPTER" (specified via 3' Adapters parameter). If you have these input sequences:
MYSEQUENCEADAPTER MYSEQUENCEADAP MYSEQUENCEADAPTERSOMETHINGELSE
All of them will be trimmed to "MYSEQUENCE". If the sequence starts with an adapter, like this:
It will be empty after trimming.
When the allowed error rate is sufficiently high, errors in the adapter sequence are allowed. For example, ADABTER (1 mismatch), ADAPTR (1 deletion), and ADAPPTER (1 insertion) will all be recognized if the error rate is set to 0.15.
Allowing adapters anywhere
Cutadapt assumes that any adapter specified via the 3` Adapters parameter was ligated to the 3' end of the sequence. This is the correct assumption for at least the SOLiD and Illumina small RNA protocols and probably others.
If, on the other hand, your adapter can also be ligated to the 5' end (on purpose or by accident), you should tell cutadapt so by using the 5' or 3' (Anywhere) Adapters parameter. It will then use a different alignment algorithm and correctly trim adapters that appear in the beginning of a read. An adapter specified this way will also be found if it appears only partially in the beginning of a read. For example, these sequences
will be trimmed to "MYSEQUENCE". Note that the regular algorithm would trim the first read to an empty sequence.
This parameter currently does not work with color space data.