Galaxy | Tool Preview

SnpEff databases: (version 4.3+T.galaxy2)
Databases matching this expression will be listed. Here you can enter text or regular expression. For example, to show only mouse databases use 'Mouse'. Note that this parameter is case-sensitive.
Databases matching this expression WILL NOT BE listed. Here you can enter text or regular expression. For excample, to avoid all ENSEMBL bundles enter 'ENSEMBL'. Note that this parameter is case-sensitive.

What it does

This tool downloads the master list of snpEff databases from https://sourceforge.net/projects/snpeff/files/databases/v4_3/. You can then look at this list and decide which database to use for your analysis. For example, if List entries matching the following expression parameter of this tool is set to Mouse the it will produce a tabular dataset with the following content:

mm10  Mouse  http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_mm10.zip
mm9   Mouse  http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_mm9.zip

This means that there two available snpEff databases for mouse genome versions mm9 and mm10. In order to download these databases you should use identifier from the first column (e.g., mm9 or mm10 in this case).


The usage scenario

There are two ways to use names of databases obtained with this tool in Galaxy's version on snpEff:

  1. Use SnpEff download tool. It will download the database to the history and you will be able to use it in SnpEff eff tool using Downloaded snpEff database in your history option of the Genome source parameter.
  2. Use Download on demand option of the SnpEff eff tool (again, Genome source parameter). In this case snpEff will download the database before performing annotation.

Using SnpEff in Galaxy: A few points to remember

SnpEff relies on specially formatted databases to generate annotations. It will not work without them. There are several ways in which these databases can be obtained.

Pre-cached databases

Many standard (e.g., human, mouse, Drosophila) databases are likely pre-cached within a given Galaxy instance. You should be able to see them listed in Genome drop-down of SnpEff eff tool.

In you do not see them keep reading...

Download pre-built databases

SnpEff project generates large numbers of pre-build databases. These are available at https://sourceforge.net/projects/snpeff/files/databases/v4_3/ and can downloaded. Follow these steps:

  1. Use SnpEff databases tool to generate a list of existing databases. Note the name of the database you need.
  2. Use SnpEff download tool to download the database.
  3. Finally, use SnpEff eff by choosing the downloaded database from the history using Downloaded snpEff database in your history option of the Genome source parameter.

Alternatively, you can specify the name of the database directly in SnpEff eff using the Download on demand option (again, Genome source parameter). In this case snpEff will download the database before performing annotation.

Create your own database

In cases when you are dealing with bacterial or viral (or, frankly, any other) genomes it may be easier to create database yourself. For this you need:

  1. Download Genbank record corresponding to your genome of interest from NCBI or use annotations in GFF format accompanied by the corresponding genome in FASTA format.
  2. Use SnpEff build to create the database.
  3. Use the database in SnpEff eff (using Custom option for Genome source parameter).

Creating custom database has one major advantage. It guaranteess that you will not have any issues related to reference sequence naming -- the most common source of SnpEff errors.


To learn more about snpEff read its manual at http://snpeff.sourceforge.net/SnpEff_manual.html