What it does
DIAMOND is a new alignment tool for aligning short DNA sequencing reads to a protein reference database such as NCBI-NR. On Illumina reads of length 100-150bp, in fast mode, DIAMOND is about 20,000 times faster than BLASTX, while reporting about 80-90% of all matches that BLASTX finds, with an e-value of at most 1e-5. In sensitive mode, DIAMOND ist about 2,500 times faster than BLASTX, finding more than 94% of all matches.
The DIAMOND algorithm is designed for the alignment of large datasets. The algorithm is not efficient for a small number of query sequences or only a single one of them, and speed will be low. BLAST is recommended for small datasets.
Input
Input data is a large protein or nucleotide sequence file.
Output
Diamond gives you a tabular output file with 12 columns:
Column Description 1 Query Seq-id (ID of your sequence) 2 Subject Seq-id (ID of the database hit) 3 Percentage of identical matches 4 Alignment length 5 Number of mismatches 6 Number of gap openings 7 Start of alignment in query 8 End of alignment in query 9 Start of alignment in subject (database hit) 10 End of alignment in subject (database hit) 11 Expectation value (E-value) 12 Bit score
Supported values for gap open and gap extend parameters depending on the selected scoring matrix.
Matrix | Supported values for (gap open)/(gap extend) |
---|---|
BLOSUM45 | (10-13)/3; (12-16)/2; (16-19)/1 |
BLOSUM50 | (9-13)/3; (12-16)/2; (15-19)/1 |
BLOSUM62 | (6-11)/2; (9-13)/1 |
BLOSUM80 | (6-9)/2; 13/2; 25/2; (9-11)/1 |
BLOSUM90 | (6-9)/2; (9-11)/1 |
PAM250 | (11-15)/3; (13-17)/2; (17-21)/1 |
PAM70 | (6-8)/2; (9-11)/1 |
PAM30 | (5-7)/2; (8-10)/1 |