Galaxy | Tool Preview

DeepVariant (version 1.5.0+galaxy1)
Built-in references were created using default options.
If your genome of interest is not listed, contact the Galaxy team.
An aligned reads file in BAM format. The reads must be aligned to the reference genome
Type of model to use for variant calling
Restrict the analysis to specific regions. A space-separated list of chromosome regions to process. Individual elements can be region literals, such as chr20:10-20 or paths to BED files.
The key difference between a regular VCF and a gVCF is that the gVCF has records for all sites, whether there is a variant call there or not. The goal is to have every site represented in the file in order to do joint analysis of a cohort in subsequent steps

Purpose

DeepVariant is a deep learning-based variant caller that takes aligned reads (in BAM or CRAM format), produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports the results in a standard VCF or gVCF file.

DeepVariant supports germline variant-calling in diploid organisms.

Please also note:

For somatic data or any other samples where the genotypes go beyond two copies of DNA, DeepVariant will not work out of the box because the only genotypes supported are hom-alt, het, and hom-ref.

The models included with DeepVariant are only trained on human data. For other organisms, see the blog post on non-human variant-calling for some possible pitfalls and how to handle them.


How DeepVariants works

DeepVariant relies on Nucleus, a library of Python and C++ code for reading and writing data in common genomics file formats (like SAM and VCF) designed for painless integration with the TensorFlow machine learning framework. Nucleus was built with DeepVariant in mind and open-sourced separately so it can be used by anyone in the genomics research community for other projects. See this blog post on Using Nucleus and TensorFlow for DNA Sequencing Error Correction.