Galaxy | Tool Preview

OTUTable (version 1.0.0)

** what it does **

Converts UCLUST format (.uc) output from Vsearch search into raw count table. The description of UCLUST format is based on the information that can be found on UCLUST documentation page.


Example

Some example records:

Type Cluster Size %Id Strand Qlo Tlo Alignment Query Target
S 0 292 '*' '*' '*' '*' '*' AH70_12410 '*'
H 0 292 99.7 '+' 0 0 292M AH70_12410 '*'
S 0 292 '*' '*' '*' '*' '*' AH70_12410 '*'
H 0 292 98.2 '+' 0 0 292M AH70_12410 '*'

Each record has ten fields, separated by tabs:

Column Description
Type Record type
Cluster Cluster number
Size Sequence length or cluster size
%Id Identity to the seed(as a percentage), or * if this is a seed.
Strand '+' plus strand, '-' minus strand, or '.' amino acids.
Qlo 0-based coordinate of alignment start in the query sequence.
Tlo 0-based coordinate of alignment start in target (seed) sequence. If minus strand, Tlo is relative to start of reverse-complement target.
Alignment Compressed representation of alignment to the seed(see below), or '*' if a seed.
Query FASTA label of query sequence
Target FASTA label of target(seed / library / database) sequence. or '*' if a seed.

Record Types are:

Column Description
L Library seed(generated only if a match if found to this seed).
S New seed.
H Hit, also known as an accept; i.e. a successful match.
D Library cluster.
C New cluster.
N Not matched (a sequence that didn't match library with --libonly specified).
R Reject (generated only if --output_rejects is specified)

The alignment is compressed using run-length encoding, as follows. Each column in the alignment is classified as M,D or I:

Code Name Query sequence Seed sequence
M Match Letter Letter
D Delete Gap Letter
I Insert Letter Gap