What it does
medaka is a tool suite to create a consensus sequence from nanopore sequencing data.
This task is performed using neural networks applied from a pileup of individual sequencing reads against a draft assembly. It outperforms graph-based methods operating on basecalled data, and can be competitive with state-of-the-art signal-based methods, whilst being much faster.
The medaka_consensus pipeline performs assembly polishing via neural networks.
Input
An assembly in .fasta format and basecalls in .fasta or .fastq format are required. See Creating a Draft Assembly for a detailed example of one method of obtaining these.
Output
Models
For best results it is important to specify the correct model, -m in the above, according to the basecaller used. Allowed values can be found by running medaka tools list_models.
Medaka models are named to indicate i) the pore type, ii) the sequencing device (MinION or PromethION), iii) the basecaller variant, and iv) the basecaller version, with the format:
{pore}_{device}_{caller variant}_{caller version}
For example the model named r941_min_fast_g303 should be used with data from MinION (or GridION) R9.4.1 flowcells using the fast Guppy basecaller version 3.0.3. By contrast the model r941_prom_hac_g303 should be used with PromethION data and the high accuracy basecaller (termed "hac" in Guppy configuration files). Where a version of Guppy has been used without an exactly corresponding medaka model, the medaka model with the highest version equal to or less than the guppy version should be selected.
References