from high-throughput sequence datacutadaptcutadapt_galaxy_wrapper.py
#if $input.extension.startswith( "fastq"):
--format=fastq
#else
--format=$input.extension
#end if
#for $a in $adapters
-a '${a.adapter_source.adapter}'
#end for
#for $aa in $anywhere_adapters
-b '${aa.anywhere_adapter_source.anywhere_adapter}'
#end for
-e $error_rate
-n $count
-O $overlap
#if str($min) != '0':
-m $min
#end if
#if str($max) != '0':
-M $max
#end if
--input='$input'
--output='$output'
> $report
This tool removes adapter sequences from DNA high-throughput
sequencing data. This is usually necessary when the read length of the
machine is longer than the molecule that is sequenced, such as in
microRNA data.
The tool is based on the opensource cutadapt_ tool.
-----
**Algorithm**
cutadapt uses a simple semi-global alignment algorithm, without any special optimizations.
For speed, the algorithm is implemented as a Python extension module in calignmodule.c.
The program is sufficiently fast for my purposes, but speedups should be simple to achieve.
**Partial adapter matches**
Cutadapt correctly deals with partial adapter matches. As an example, suppose
your adapter sequence is "ADAPTER" (specified via 3' Adapters parameter).
If you have these input sequences:
::
MYSEQUENCEADAPTER
MYSEQUENCEADAP
MYSEQUENCEADAPTERSOMETHINGELSE
All of them will be trimmed to "MYSEQUENCE". If the sequence starts with an
adapter, like this:
::
ADAPTERSOMETHING
It will be empty after trimming.
When the allowed error rate is sufficiently high, errors in
the adapter sequence are allowed. For example, ADABTER (1 mismatch), ADAPTR (1 deletion),
and ADAPPTER (1 insertion) will all be recognized if the error rate is set to 0.15.
**Allowing adapters anywhere**
Cutadapt assumes that any adapter specified via the *3` Adapters* parameter
was ligated to the 3' end of the sequence. This is the correct assumption for
at least the SOLiD and Illumina small RNA protocols and probably others.
If, on the other hand, your adapter can also be ligated to the 5' end (on
purpose or by accident), you should tell cutadapt so by using the *5' or 3' (Anywhere)
Adapters parameter. It will then use a different alignment algorithm and
correctly trim adapters that appear in the beginning of a read. An adapter
specified this way will also be found if it appears only partially in the
beginning of a read. For example, these sequences
::
ADAPTERMYSEQUENCE
PTERMYSEQUENCE
will be trimmed to "MYSEQUENCE". Note that the regular algorithm would trim
the first read to an empty sequence.
This parameter currently does not work with color space data.
.. _cutadapt: http://code.google.com/p/cutadapt/