Kodoja is a tool intended to identify viral sequences in a FASTQ/FASTA sequencing run by matching them against both Kraken and Kaiju databases.
The main output is a tab-separated table as follows (tabular format in Galaxy) with the following columns:
The counts in columns 6 and 7 are for reads assigned to that genus, but not to any species within it.
For example,
Species | Species TaxID | Species sequences | Species sequences (stringent) | Genus | Genus sequences | Genus sequences (stringent) |
Cassava brown streak virus | 137758 | 45 | 45 | Ipomovirus | 0 | 0 |
Ugandan cassava brown streak virus | 946046 | 28 | 28 | Ipomovirus | 0 | 0 |
Tobacco etch virus | 12227 | 21 | 19 | Potyvirus | 0 | 0 |
The second most important output, which you can optionally capture for use within Galaxy, is a per-read table summarising matches found with Kraken and/or Kaiju. The Kodoja Retrieve tool is not currently available within Galaxy, but you can instead use this file directly within Galaxy to filter out just the virus reads, or even reads matched to a specific taxid. See for example seq_filter_by_id which is available via the Galaxy Tool Shed:
http://toolshed.g2.bx.psu.edu/view/peterjc/seq_filter_by_id https://github.com/peterjc/pico_galaxy/tree/master/tools/seq_filter_by_id
The Kodoja Search command line tool offers additional options not currently exposed in Galaxy, including:
Number of threads -s, --host_subset Subset host sequences before Kaiju -m TRIM_MINLEN, --trim_minlen TRIM_MINLEN Trimmomatic minimum length -a TRIM_ADAPT, --trim_adapt TRIM_ADAPT Illumina adapter sequence file -q KRAKEN_QUICK, --kraken_quick KRAKEN_QUICK Number of minium hits by Kraken -p, --kraken_preload Kraken preload database -c KAIJU_SCORE, --kaiju_score KAIJU_SCORE Kaju alignment score -l KAIJU_MINLEN, --kaiju_minlen KAIJU_MINLEN Kaju minimum length -i KAIJU_MISMATCH, --kaiju_mismatch KAIJU_MISMATCH Kaju allowed mismatches
For more information, please see the Kodoja manual https://github.com/abaizan/kodoja/wiki/Kodoja-Manual