Galaxy | Tool Preview

GEMINI query (version 0.20.1+galaxy2)
Only files with version 0.20.1 are accepted.
Genotype filter expressions
Genotype filter expression 0
Sample filter expressions
Sample filter expression 0
Region Filters
Region Filter 0
Constraints defined here will become the WHERE clause of the SQL query issued to the GEMINI database. E.g. alt='G' or impact_severity = 'HIGH'.
Output format options
Output format options 0

What it does

The real power in the GEMINI framework lies in the fact that all of your genetic variants have been stored in a convenient database in the context of a wealth of genome annotations that facilitate variant interpretation. The expressive power of SQL allows one to pose intricate questions of one’s variation data. This tool offers you a flexible, yet relatively easy way to query your variants!


Building your variant query with the Basic variant query constructor

This mode tries to break down the complexity of formulating GEMINI queries into more easily digestable parts. In this mode, the tool also prevents you from combining options that are incompatible or not meaningful.

Genotype filters

These are discussed here in the GEMINI documentation.

The tool supports regular genotype filters like:

gt.sample1 == HET and gt_depths.sample1 >= 15

, which would keep only variants for which sample 1 is a heterozygous carrier and if the genomic position in sample1 is covered by at least 15 sequencing reads, as well as GEMINI wildcard filters of the general form (COLUMN).(SAMPLE_FILTER).(RULE).(RULE_ENFORCEMENT) like:

(gt_types).(phenotype==2).(!=HOM_REF).(all)

, which keeps only variants for which all phenotypic samples are homozygous.

Sample filters

Sample filters have the same format as the second component of the genotype wildcard filters above, so:

phenotype == 2

would filter for phenotypically affected samples. In this case, however, the filter determines, from which samples variants should be reported, i.e., here, only variants found in phenotypically affected samples become analyzed. You can use the --in filter to adjust the exact meaning of the sample filter.

Region filters

They let you restrict your analysis to parts of the genome, which can be useful if you have prior knowledge of the approximate location of a variant of interest.

If you specify more then one region filter, they get combined with a logical OR, meaning variants and genes falling in any of the regions are reported.

Additional constraints on variants

These get translated directly into the WHERE clause of an SQL query and, thus, have to be expressed in valid SQL syntax. As an example you could use:

is_exonic = 1 and impact_severity != 'LOW'

to indicate that you are only interested in exonic variants that are not of LOW impact severity, i.e., not silent mutations.

Note that in SQL syntax tests for equality use a single =, while genotype filters (discussed above) are following Python syntax and use == for the same purpose. Also note that non-numerical values need to be enclosed in single-quotes, e.g. 'LOW', but numerical values must NOT be.


Building your query with the Advanced query constructor

For the sake of simplicity, the basic mode of the tool limits your queries to the variants table of the underlying database. While this still allows many useful queries to be formulated, it prevents you from joining information from other tables (in particular, the gene_detailed table) or to query a different table directly.

In advanced mode, you take responsibility for formulating the complete SQL query in correct syntax, which allows you to do anything you could do with the command line tool. Beyond querying other tables, this includes changing output column names, deriving simple statistics on columns using the SQL Min, Max, Count, Avg and Sum functions, and more.

The price you pay for this extra flexibility is that you will have to make sure that any other tool options you set are compatible with the result of your particular query. For example, most output formats except the tabular default output of GEMINI are incompatible with non-standard queries. Choosing non-compatible options can result in them getting ignored silently, but also in tool errors, or in problems with downstream tools.

The chapter Querying the GEMINI database of the GEMINI documentation can get you started with formulating your own queries.

Note that genotype filters and sample filters cannot be expressed as genuine SQL queries, so even the Advanced query constructor is offering them. Region filters and sort order of rows and columns on the other hand can be controlled through SQL queries, like in this example:

SELECT gene, chrom, start, end, ref, alt FROM variants WHERE chrom = 'chr1'
AND start >= 10000000 and stop <= 20000000 and is_lof = 1 ORDER BY chrom,
start

, which would report all loss-of-function variants between 10,000,000 and 20,000,000 on chr1 and report the selected columns sorted on chromosome, then position.