Galaxy | Tool Preview

Diamond (version 2.0.15+galaxy0)
(blastp/blastx)
The FASTA sequence identifiers as well as the sequences of query and target need to be identical for a hit to be deleted
Compositionally biased sequences often cause false positive matches, which are effectively filtered by this algorithm in a way similar to the composition based statistics used by BLAST
Built-ins were indexed using default options
If your database of interest is not listed, contact your Galaxy admin
Any taxonomic rank can be used, and only reference sequences matching one of the specified taxon ids will be searched against.
Choose one of the sensitivity modes. The default mode is mainly designed for short read alignment, i.e. finding significant matches of >50 bits on 30-40aa fragments. The sensitive mode is a lot more sensitive than the default and generally recommended for aligning longer sequences. The more sensitive mode provides even more sensitivity. More sensitivity may increase computation time.
This is the main parameter for controlling the program’s memory and disk space usage. Bigger numbers will increase the use of memory and temporary disk space, but also improve performance
In parentheses are the supported values for (gap open)/(gap extend). In brackets are default gap penalties
Leave empty for default (see scoring matrix)
Leave empty for default (see scoring matrix)
DIAMOND by default applies the tantan repeat masking algorithm to the query and target sequences as described in (Frith, 2011). This masking procedure increases the specificity of alignments and serves to filter out spurious hits. Note that when using --comp-based-stats (2,3,4), tantan masking is disabled by default.
(--evalue/--min-score)
he query dataset will first be searched at a lower sensitivity setting, only searching those query sequences at the target sensitivity that fail to produce a significant alignment at a lower sensitivity.
Double-indexed is the main algorithm of the program, designed for large input files but less efficient for small query files. Query-indexed and improves performance for small query files. This mode will be automatically triggered based on the input. Contiguous-seed mode and further improves performance for small query files. The modes differ slightly in their sensitivity, so results are not guaranteed to be 100% identical for different settings of this option.
Setting this to 0 will report all alignments that were found.
Target sequences will be ranked according to their ungapped extension scores at seed hits, and gapped extensions will only be computed for the best N targets for each query. Note that this option increases memory use.
Report only alignments above the given percentage of sequence identity
Report only alignments above the given percentage of query cover
Report only alignments above the given percentage of subject cover
Output options
Output options 0
Advanced options
Advanced options 0

What it does

DIAMOND is a new alignment tool for aligning short DNA sequencing reads to a protein reference database such as NCBI-NR. On Illumina reads of length 100-150bp, in fast mode, DIAMOND is about 20,000 times faster than BLASTX, while reporting about 80-90% of all matches that BLASTX finds, with an e-value of at most 1e-5. In sensitive mode, DIAMOND ist about 2,500 times faster than BLASTX, finding more than 94% of all matches.

The DIAMOND algorithm is designed for the alignment of large datasets. The algorithm is not efficient for a small number of query sequences or only a single one of them, and speed will be low. BLAST is recommended for small datasets.

Input

Input data is a large protein or nucleotide sequence file.

Output

Diamond gives you a tabular output file with 12 columns:

Column Description 1 Query Seq-id (ID of your sequence) 2 Subject Seq-id (ID of the database hit) 3 Percentage of identical matches 4 Alignment length 5 Number of mismatches 6 Number of gap openings 7 Start of alignment in query 8 End of alignment in query 9 Start of alignment in subject (database hit) 10 End of alignment in subject (database hit) 11 Expectation value (E-value) 12 Bit score

Supported values for gap open and gap extend parameters depending on the selected scoring matrix.

Matrix Supported values for (gap open)/(gap extend)
BLOSUM45 (10-13)/3; (12-16)/2; (16-19)/1
BLOSUM50 (9-13)/3; (12-16)/2; (15-19)/1
BLOSUM62 (6-11)/2; (9-13)/1
BLOSUM80 (6-9)/2; 13/2; 25/2; (9-11)/1
BLOSUM90 (6-9)/2; (9-11)/1
PAM250 (11-15)/3; (13-17)/2; (17-21)/1
PAM70 (6-8)/2; (9-11)/1
PAM30 (5-7)/2; (8-10)/1