Galaxy | Tool Preview

GEMINI burden (version 0.20.1)
Only files with version 0.20.1 are accepted.

What it does

Burden performs sample-wise gene-level burden calculations.

The burden tool provides a set of utilities to perform burden summaries on a per-gene, per sample basis. By default, it outputs a table of gene-wise counts of all high impact variants in coding regions for each sample:

GEMINI burden example:

gene    M10475  M10478  M10500  M128215
WDR37   2       2       2       2
CTBP2   0       0       0       1
DHODH   1       0       0       0

Setting examples

--nonsynonymous

If you want to be a little bit less restrictive, you can include all non-synonymous variants instead.

GEMINI output with setting --nonsynonymous:

gene    M10475  M10478  M10500  M128215
SYCE1   0       1       1       0
WDR37   2       2       2       2
CTBP2   0       0       0       1
ASAH2C  2       1       1       0
DHODH   1       0       0       0

--calpha

If your database has been loaded with a PED file describing case and control samples, you can calculate the c-alpha statistic for cases vs. control.

GEMINI output with setting --calpha:

gene    T       c       Z       p_value
SYCE1   -0.5    0.25    -1.0    0.841344746069
WDR37   -1.0    1.5     -0.816496580928 0.792891910879
CTBP2   0.0     0.0     nan     nan
ASAH2C  -0.5    0.75    -0.57735026919  0.718148569175
DHODH   0.0     0.0     nan     nan

To calculate the P-value using a permutation test, use the --permutations option, specifying the number of permutations of the case/control labels you want to use.

--min-aaf and --max-aaf for --calpha

By default, all variants affecting a given gene will be included in the C-alpha computation. However, one may establish alternate allele frequency boundaries for the variants included using the --min-aaf and --max-aaf options.

Used settings:

  • -calpha test.burden.db
  • -min-aaf 0.0
  • -max-aaf 0.01
  • -cases
  • -controls for --calpha

If you do not have a PED file loaded, or your PED file does not follow the standard PED phenotype encoding format you can still perform the c-alpha test, but you have to specify which samples are the control samples and which are the case samples.

Used settings:

  • -controls M10475 M10478
  • -cases M10500 M128215
  • -calpha

Output:

gene    T       c       Z       p_value
SYCE1   -0.5    0.25    -1.0    0.841344746069
WDR37   -1.0    1.5     -0.816496580928 0.792891910879
CTBP2   0.0     0.0     nan     nan
ASAH2C  -0.5    0.75    -0.57735026919  0.718148569175
DHODH   0.0     0.0     nan     nan

--nonsynonymous --calpha

If you would rather consider all nonsynonymous variants for the C-alpha test rather than just the medium and high impact variants, add the --nonsynonymous flag.