Galaxy |

VSearch clustering (version 2.8.3.0)

Select your input FASTA file:

Choose sorting method to use before clustering:

Indicate that input sequences are not presorted by length:

(--usersort)

ID definition:

(--iddef)

Reject hit if identity is lower than this value:

(--id)

Do not ignore terminal gaps in MSA for consensus:

(--cons_truncate)

Mask sequences:

(--qmask)

Read abundance annotation from input:

(--sizein)

Write cluster abundances to centroid file:

(--sizeout)

Strand specific clustering:

(--strand)

Number of non-matching hits to consider:

(--maxrejects)

Number of hits to accept and show per strand:

(--maxaccepts)

Select output files:

UCLUST-like output:

(--uc)

What it does

vsearch implements a single-pass, greedy star-clustering algorithm, similar to the algorithms implemented in usearch, DNAclust and sumaclust for example.

Clustering options (most searching options also apply)

`--centroids FILENAME`
	output centroid sequences to FASTA file
`--cluster_fast FILENAME`
	cluster sequences after sorting by length
`--cluster_size FILENAME`
	cluster sequences after sorting by abundance
`--cluster_smallmem FILENAME`
	cluster already sorted sequences (see -usersort)
`--clusters STRING`
	output each cluster to a separate FASTA file
`--consout FILENAME`
	output cluster consensus sequences to FASTA file
`--cons_truncate`
	do not ignore terminal gaps in MSA for consensus
`--id REAL`	reject if identity lower
`--iddef INT`	id definition, 0-4=CD-HIT,all,int,MBL,BLAST (2)
`--msaout FILENAME`
	output multiple seq. alignments to FASTA file
`--qmask`	seqs with dust, soft or no method (dust)
`--sizein`	propagate abundance annotation from input
`--sizeout`	write cluster abundances to centroid file
`--strand`	cluster using plus or both strands (plus)
`--uc FILENAME`	filename for UCLUST-like output
`--usersort`	indicate sequences not presorted by length

For details about this tool, please refer to the GitHub repository or the vsearch manual.