Galaxy |

Diamond (version 2.0.15+galaxy0)

Alignment mode:

(blastp/blastx)

Suppress reporting of identical self-hits between sequences:

The FASTA sequence identifiers as well as the sequences of query and target need to be identical for a hit to be deleted

Composition based statistics:

Compositionally biased sequences often cause false positive matches, which are effectively filtered by this algorithm in a way similar to the composition based statistics used by BLAST

Input query file in FASTA or FASTQ format:

Will you select a reference database from your history or use a built-in index?:

Built-ins were indexed using default options

Select a reference database:

If your database of interest is not listed, contact your Galaxy admin

Restrict search taxonomically?:

Any taxonomic rank can be used, and only reference sequences matching one of the specified taxon ids will be searched against.

Sensitivity Mode:

Choose one of the sensitivity modes. The default mode is mainly designed for short read alignment, i.e. finding significant matches of >50 bits on 30-40aa fragments. The sensitive mode is a lot more sensitive than the default and generally recommended for aligning longer sequences. The more sensitive mode provides even more sensitivity. More sensitivity may increase computation time.

Block size in billions of sequence letters to be processed at a time:

This is the main parameter for controlling the program’s memory and disk space usage. Bigger numbers will increase the use of memory and temporary disk space, but also improve performance

Scoring matrix:

In parentheses are the supported values for (gap open)/(gap extend). In brackets are default gap penalties

Gap open penalty:

Leave empty for default (see scoring matrix)

Gap extension penalty:

Leave empty for default (see scoring matrix)

Masking algorithm:

DIAMOND by default applies the tantan repeat masking algorithm to the query and target sequences as described in (Frith, 2011). This masking procedure increases the specificity of alignments and serves to filter out spurious hits. Note that when using --comp-based-stats (2,3,4), tantan masking is disabled by default.

Method to filter?:

(--evalue/--min-score)

Maximum expected value to keep an alignment:

Run multiple rounds of searches with increasing sensitivity:

he query dataset will first be searched at a lower sensitivity setting, only searching those query sequences at the target sensitivity that fail to produce a significant alignment at a lower sensitivity.

Algorithm for seed search:

Double-indexed is the main algorithm of the program, designed for large input files but less efficient for small query files. Query-indexed and improves performance for small query files. This mode will be automatically triggered based on the input. Contiguous-seed mode and further improves performance for small query files. The modes differ slightly in their sensitivity, so results are not guaranteed to be 100% identical for different settings of this option.

Method to restrict the number of hits?:

The maximum number of target sequences per query to report alignments for:

Setting this to 0 will report all alignments that were found.

Limit on the number of Smith Waterman extensions:

Target sequences will be ranked according to their ungapped extension scores at seed hits, and gapped extensions will only be computed for the best N targets for each query. Note that this option increases memory use.

Minimum identity percentage to report an alignment:

Report only alignments above the given percentage of sequence identity

Minimum query cover percentage to report an alignment:

Report only alignments above the given percentage of query cover

Minimum subject cover percentage to report an alignment:

Report only alignments above the given percentage of subject cover

Output options

Output options 0

Advanced options

Advanced options 0

What it does

DIAMOND is a new alignment tool for aligning short DNA sequencing reads to a protein reference database such as NCBI-NR. On Illumina reads of length 100-150bp, in fast mode, DIAMOND is about 20,000 times faster than BLASTX, while reporting about 80-90% of all matches that BLASTX finds, with an e-value of at most 1e-5. In sensitive mode, DIAMOND ist about 2,500 times faster than BLASTX, finding more than 94% of all matches.

The DIAMOND algorithm is designed for the alignment of large datasets. The algorithm is not efficient for a small number of query sequences or only a single one of them, and speed will be low. BLAST is recommended for small datasets.

Input

Input data is a large protein or nucleotide sequence file.

Output

Diamond gives you a tabular output file with 12 columns:

Column Description 1 Query Seq-id (ID of your sequence) 2 Subject Seq-id (ID of the database hit) 3 Percentage of identical matches 4 Alignment length 5 Number of mismatches 6 Number of gap openings 7 Start of alignment in query 8 End of alignment in query 9 Start of alignment in subject (database hit) 10 End of alignment in subject (database hit) 11 Expectation value (E-value) 12 Bit score

Supported values for gap open and gap extend parameters depending on the selected scoring matrix.

Matrix	Supported values for (gap open)/(gap extend)
BLOSUM45	(10-13)/3; (12-16)/2; (16-19)/1
BLOSUM50	(9-13)/3; (12-16)/2; (15-19)/1
BLOSUM62	(6-11)/2; (9-13)/1
BLOSUM80	(6-9)/2; 13/2; 25/2; (9-11)/1
BLOSUM90	(6-9)/2; (9-11)/1
PAM250	(11-15)/3; (13-17)/2; (17-21)/1
PAM70	(6-8)/2; (9-11)/1
PAM30	(5-7)/2; (8-10)/1