Galaxy | Tool Preview

ANNOVAR (version 2016march)
Must be either a VCF file, or a CG varfile, or a tab-separated file with a 1 line header
if checked, cDNA level annotation is compatible with HGVS
This option identifies Giemsa-stained chromosomes bands, (e.g. 1q21.1-q23.3).
This option phastCons 44-way alignments to annotate variants that fall within conserved genomic regions.
Genetic variants that are mapped to segmental duplications are most likely sequence alignment errors and should be treated with extreme caution.
Identify previously reported structural variants in DGV (Database of Genomic Variants)
Identify variants reported in previously published GWAS (Genome-wide association studies)
avSNP are reformatted dbSNP databases with one variant per line and left-normalized indels (for a more detailed discussion read this article: http://annovar.openbioinformatics.org/en/latest/articles/dbSNP/). Flagged SNPs include SNPs less than 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as clinically associated
2012april database for ALL populations was converted to hg18 using the UCSC liftover program
si versions of databases contain indels and chrY calls
The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. See http://exac.broadinstitute.org/faq for more information.
non-TGCA samples
This is a custom made annotation file, not available from the ANNOVAR website. The database file can be obtained from http://bioinf-galaxian.erasmusmc.nl/public/Data/hg19_gonl.txt
This dataset provides machine-learning prediction on how genetic variants affect RNA splicing. (Xiong et al, Science 2015)
GERP identifies constrained elements in multiple alignments by quantifying substitution deficits (see http://mendel.stanford.edu/SidowLab/downloads/gerp/ for details) This option annotates those variants having GERP++>2 in human genome, as this threshold is typically regarded as evolutionarily conserved and potentially functional
version 2014-02-11. Annotations include Variant Clinical Significance (unknown, untested, non-pathogenic, probable-non-pathogenic, probable-pathogenic, pathogenic, drug-response, histocompatibility, other) and Variant disease name.
NCI-60 exome allele frequency data
Diversity Panel; 46 unrelated individuals
Diversity Panel, Pedigree, YRI trio and PUR trio
LJB refers to Liu, Jian, Boerwinkle paper in Human Mutation, pubmed ID 21520341.
e.g. annotated as -score,damaging- or -score,benign- instead of just score
provides splice site effect prediction by AdaBoost and Random Forest
170 million variants from 34 projects (13K genomes and 64K exomes)
40 million variants from 32K samples
an exhaustive collection of pre-computed pathogenicity predictions of human mitochondrial non-synonymous variants
an exhaustive collection of pre-computed pathogenicity predictions of human mitochondrial non-synonymous variants
provides whole-genome functional prediction scores on ~20 different algorithms. Now additions to the database include DANN, PROVEAN, fitConsPlease, etc.
Optional

What it does

This tool will annotate a file using ANNOVAR.

ANNOVAR Website and Documentation

Website: http://www.openbioinformatics.org/annovar/

Paper: http://nar.oxfordjournals.org/content/38/16/e164

Input Formats

Input Formats may be one of the following:

VCF file Complete Genomics varfile

Custom tab-delimited file (specify chromosome, start, end, reference allele, observed allele columns)

Custom tab-delimited CG-derived file (specify chromosome, start, end, reference allele, observed allele, varType columns)

Database Notes

see ANNOVAR website for extensive documentation, a few notes on some of the databases:

LJB2 Database

PolyPhen2 HVAR should be used for diagnostics of Mendelian diseases, which requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles.The authors recommend calling probably damaging if the score is between 0.909 and 1, and possibly damaging if the score is between 0.447 and 0.908, and benign if the score is between 0 and 0.446.

PolyPhen HDIV should be used when evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data. The authors recommend calling probably damaging if the score is between 0.957 and 1, and possibly damaging if the score is between 0.453 and 0.956, and benign is the score is between 0 and 0.452.