Galaxy | Tool Preview

IDBA-HYBRID (version 1.1.3)

IDBA is an iterative De Bruijn Graph De Novo Assembler for sequence assembly. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k-mer size to perform the assembling task. For all of them, it is very crucial to find a specific value of k. If k is too large, there will be a lot of gap problems in the graph. If k is too small, there will a lot of branch problems. IDBA uses not only one specific k but a range of k values to build the iterative de Bruijn graph. It can keep all the information in graphs with different k values.

IDBA-Hybrid is an iterative De Bruijn Graph De Novo Assembler for hybrid sequencing. It is an extension of IDBA-UD algorithm. It aims at using a closed related reference genome to help de novo assembly, especially when sequencing depth is low. IDBA-Hybrid does alignment between reads and reference first to extract similar regions in the reference genome, and then it correct the similar regions based on the alignment results and apply local assembly technique to resolve potential structure virations. Finally, it groups all the reads and the contigs got from those similar regions to do de novo assembly. The expriments showed it outperforms all existing de novo or hybrid assembly algorithms, especilly when the sequencing depth is low and the reference genome is similar to the target genome.

Input: IDBA-* take interleaved paired end data in the FASTA format as input, i.e. paired-end reads need to be stored in the same FASTA file such that a pair of reads should be in two consecutive lines. In Galaxy paired reads in separate FASTQ files can be converted into interleaved FASTA using the tools:

Note that, IDBA-* assumes that the paired-end reads are in order (->,<-). If your data is in reverse order (<-,->), please convert it by yourself.