Galaxy | Tool Preview

Identify eQTL hotspots (version 5.0.0)
A tabular file with cM and bp positions, as well as the number of eQTL and genes for each interval
A tabular file with the gene and eQTL frequency per sliding window interval
A tabular file with the total number of genes, eQTLs, cM and number of permutations.
A tabular classification file with all eQTL
A tabular classification file with only cis eQTL
A tabular classification file with only trans eQTL
A tabular file mapping sliding window IDs to lookup table IDs

What it does

Identify the max number of eQTL expected by chance per cM using a permutation approach.

Eliminate differential gene density as an explanatory factor for eQTL hotspots, by performing a chi-squared test per bin.

Extract lists of eQTLs linked to each unbiased eQTL hotspot.

Genome wide eQTL freqeuncy plots.


Example input files

Frequency per bin file: frequency of all, cis and trans eQTLs and genes, each row correspond to a 2cM (or smaller) interval (11 columns; only a part of the file is shown):

int.id  chr     marker  interval        cM      bp        length_cM    num.eQTL.all    num.eQTL.cis    num.eQTL.trans  num.genes
1       1       1       0.0001         0.0     2038278         2.0     94.0            2.0             64.0            12.0
2       1       1       0.0201         2.0     2466324         2.0     17.0            3.0             11.0            34.0
3       1       1       0.0401         4.0     2894370         2.0     8.0             2.0             5.0             29.0
4       1       1       0.0601         6.0     3322416         1.53    11.0            5.0             3.0             10.0
5       1       2       0.0754         7.53    3649871         2.0     27.0            6.0             19.0            18.0
6       1       2       0.0954         9.53    4095673         2.0     8.0             4.0             3.0             12.0
7       1       2       0.1154         11.53   4541476         2.0     4.0             2.0             2.0             17.0
8       1       2       0.1354         13.53   4987278         2.0     8.0             4.0             2.0             19.0
9       1       2       0.1554         15.53   5433081         2.0     5.0             0               3.0             15.0

Sliding frequency file: Frequency of eQTLs and genes per sliding window interval (4 - 5.9 cM) output file, each row correspond to a sliding window interval (11 columns; only a part of the file is shown):

sliding.id  chr  sliding.cM   sliding.all.eQTL  sliding.cis.eQTL  sliding.trans.eQTL  sliding.genes  sliding.all.eQTL/cM sliding.cis.eQTL/cM  sliding.trans.eQTL/cM   sliding.genes/cM
1              1       4.0             111.0           5.0             75.0            46.0            27.75                   1.25            18.75                   11.5
2              1       4.0             25.0            5.0             16.0            63.0            6.25                    1.25            4.0                     15.75
3              1       5.53            46.0            13.0            27.0            57.0            8.32                    2.35            4.88                    10.31
4              1       4.0             35.0            10.0            22.0            30.0            8.75                    2.5             5.5                     7.5
5              1       4.0             12.0            6.0             5.0             29.0            3.0                     1.5             1.25                    7.25
6              1       4.0             12.0            6.0             4.0             36.0            3.0                     1.5             1.0                     9.0
7              1       4.0             13.0            4.0             5.0             34.0            3.25                    1.0             1.25                    8.5
8              1       5.24            67.0            5.0             46.0            41.0            12.79                   0.95            8.78                    7.82
9              1       4.0             57.0            5.0             39.0            53.0            14.25                   1.25            9.75                    13.25
10             1       4.0             13.0            3.0             9.0             58.0            3.25                    0.75            2.25                    14.5
11             1       4.0             11.0            3.0             5.0             54.0            2.75                    0.75            1.25                    13.5
12             1       4.0             11.0            4.0             3.0             35.0            2.75                    1.0             0.75                    8.75

Frequency summary file:

Total number of eQTLs (all)             31549
Total number of cis-eQTLs               4863
Total number of trans-eQTLs             21428
Total number of genes                   31036
Total number of cM                      1861.57
Expected number of eQTL per cM (all)    16.95
Expected number of cis-eQTL per cM      2.61
Expected number of trans-eQTL per cM    11.51
Expected number of genes per cM         16.67
User specified number of permutations   1000
Number of intervals per sliding window  2

eQTL full classification file, each row correspond to an eQTL (16 columns; only a part of the file is shown). A classification column was added to the eQTL results file:

gene   index  chr  start_marker  start_int  end_marker  end_int  peak_marker   peak_int        peakLR          rsq             rtsq    parent_up_reg   classification   eQTL_bin    gene_bin
geneA  1       6       13      1.5139          15      1.6431          13      1.5539          12.7532485      0.1337606       0.3630217       parentA     trans       691             800
geneC  2       9       5       0.8106          6       0.9614          6       0.9214          20.344489       0.1559524       0.3123026       parentB     trans       902             700
geneC  3       9       8       1.2052          8       1.2452          8       1.2052          16.6822024      0.1244943       0.314542        parentA     cis         917             920
geneD  4       9       1       0.0001          2       0.2395          1       0.1201          19.531317       0.1753893       0.4300621       parentA     cis         860             862
geneH  5       1       1       0.0001          1       0.1001          1       0.0001          19.5727096      0.1373944       0.392982        parentB     trans       939             465
geneH  6       1       9       1.0268          11      1.2164          10      1.1261          13.5560176      0.095168        0.4823061       parentB     trans       1000            465
geneH  7       6       14      1.5977          15      1.8031          15      1.7231          19.8953622      0.3181244       0.3909106       parentB     no_result   904             904
geneI  8       9       7       1.0982          9       1.3079          8       1.2052          20.3966235      0.1305025       0.4233788       parentA     cis         977             969

eQTL cis classification file, each row correspond to a cis eQTL (16 columns; only a part of the file is shown):

gene   index  chr  start_marker  start_int  end_marker  end_int  peak_marker   peak_int        peakLR          rsq             rtsq    parent_up_reg   classification   eQTL_bin    gene_bin
geneC  3       9       8       1.2052          8       1.2452          8       1.2052          16.6822024      0.1244943       0.314542        parentA     cis         917             920
geneD  4       9       1       0.0001          2       0.2395          1       0.1201          19.531317       0.1753893       0.4300621       parentA     cis         860             862
geneI  8       9       7       1.0982          9       1.3079          8       1.2052          20.3966235      0.1305025       0.4233788       parentA     cis         977             969

eQTL trans classification file, each row correspond to a trans eQTL (16 columns; only a part of the file is shown):

gene   index  chr  start_marker  start_int  end_marker  end_int  peak_marker   peak_int        peakLR          rsq             rtsq    parent_up_reg   classification   eQTL_bin    gene_bin
geneA  1       6       13      1.5139          15      1.6431          13      1.5539          12.7532485      0.1337606       0.3630217       parentA     trans       691             800
geneC  2       9       5       0.8106          6       0.9614          6       0.9214          20.344489       0.1559524       0.3123026       parentB     trans       902             700
geneH  5       1       1       0.0001          1       0.1001          1       0.0001          19.5727096      0.1373944       0.392982        parentB     trans       939             465
geneH  6       1       9       1.0268          11      1.2164          10      1.1261          13.5560176      0.095168        0.4823061       parentB     trans       1000            465

Map sliding window IDs (first column) to lookup table IDs (second column) (2 columns; only a part of the file is shown):

1      [1, 2]
2      [2, 3]
3      [3, 4, 5]
4      [5, 6]
5      [6, 7]
6      [7, 8]
7      [8, 9]
8      [9, 10, 11]
9      [11, 12]
10     [12, 13]
11     [13, 14]
12     [14, 15]
13     [15, 16]
14     [16, 17]
15     [17, 18]
16     [18, 19]
17     [19, 20, 21]

Example output files

Significant hotspots output file, rows correspond to the sliding intervals (10 columns; only a part of the file is shown). If 1 in “threshold.result” column, then more eQTLs than maximum expected by chance. If 1 in “chisq.result” column, then excess eQTLs compared with number of genes. Three output file like this are generated, for all, cis and trans eQTLs respectively:

sliding.id chr sliding.cM sliding.all.eQTL sliding.genes sliding.all.eQTL/cM threshold.result chisq.result significant hotspot.number 1 1 4.0 111.0 46.0 27.75 0 1 2 1 4.0 25.0 63.0 6.25 0 -1 3 1 5.53 46.0 57.0 8.32 0 0 4 1 4.0 35.0 30.0 8.75 0 0 5 1 4.0 12.0 29.0 3.0 0 0 6 1 4.0 12.0 36.0 3.0 0 0 7 1 4.0 13.0 34.0 3.25 0 0 8 1 5.24 67.0 41.0 12.79 0 0 9 1 4.0 57.0 53.0 14.25 0 0 10 1 4.0 13.0 58.0 3.25 0 -1 11 1 4.0 11.0 54.0 2.75 0 -1 12 1 4.0 11.0 35.0 2.75 0 0 13 1 4.0 8.0 22.0 2.0 0 0 14 1 4.0 10.0 26.0 2.5 0 0 15 1 4.0 17.0 26.0 4.25 0 0 16 1 4.0 24.0 25.0 6.0 0 0 17 1 5.94 226.0 59.0 38.05 1 1 * 1 18 1 4.0 205.0 28.0 51.25 1 1 * 1 19 1 5.91 89.0 55.0 15.06 0 0 20 1 4.0 47.0 64.0 11.75 0 0

eQTL/gene lists extracted for significant hotspots (Only the first 3 eQTLs linked to hotspots 1 - 5 are shown). The chromosome, list of IDs and number of eQTLs in each hotspot is given in the header. Three output file like this are generated, for all, cis and trans eQTLs respectively:

= = = = =   Hotspot 1    chr 1    sliding.ids: [17, 18]    int.ids: [19, 20, 21, 22]    nr.eQTL: 257   = = = = =
geneA    639     1       3       0.2878  4       0.3872  3       0.3478  13.7958496      0.110934        0.487661       parentB      no_result   19      NA
geneL   800     1       3       0.2478  4       0.4072  3       0.3478  24.2128991      0.2848178       0.4639009       parentB      no_result   19      NA
geneB   382     1       3       0.2878  4       0.3872  3       0.3478  13.7048724      0.1522281       0.3023807       parentB      trans       19      757
geneD   457     1       3       0.2678  4       0.4072  3       0.3478  16.2210425      0.1537186       0.3527068       parentA      trans       19      722
geneE   381     1       3       0.2678  4       0.4272  3       0.3478  19.2398655      0.1747831       0.4636225       parentA      cis         19      16

…

= = = = =   Hotspot 2    chr 1    sliding.ids: [36, 37]    int.ids: [43, 44, 45, 46]    nr.eQTL: 268   = = = = =
geneW    146     1       8       0.6998  9       0.9588  8       0.7798  17.6058658      0.168243        0.3517602       parentA      cis        43      41
geneP    510     1       8       0.6998  8       0.7998  8       0.7798  48.9321454      0.6530789       0.7453719       parentB      trans      43      566
geneF    231     1       8       0.7598  8       0.7998  8       0.7798  13.2268263      0.1715268       0.4169803       parentB      trans      43      491
geneY    480     1       7       0.6922  8       0.7998  8       0.7798  71.8820179      0.7463132       0.8353116       parentB      no_result  43      NA
geneG    652     1       8       0.7798  8       0.7798  8       0.7798  11.5596194      0.1168083       0.3429812       parentB      trans      43      753
geneJ    760     1       8       0.6998  9       0.9188  8       0.7798  22.0870242      0.2083328       0.396835        parentA      cis        43      49

…

Full summary file (2 columns; 11 rows):

Total number of eQTLs (all)                            31549
Total number of cis-eQTLs                              4863
Total number of trans-eQTLs                            21428
Total number of genes                                  31036
Total number of cM                                     1861.57
Expected number of eQTL per cM (all)                   16.95
Expected number of cis-eQTL per cM                     2.61
Expected number of trans-eQTL per cM                   11.51
Expected number of genes per cM                        16.67
User specified number of permutations                  10
Number of intervals per sliding window                 2
Calculated permutation threshold for all eQTL (eQTL/cM)                         33.55
Chi-squared test population estimate for all eQTL (genes:eQTL)                  0.496:0.504
Chi-squared test assumption for all eQTL: min number of [eQTL + genes] in bin   10.08
Calculated permutation threshold for all eQTL (eQTL/cM)                         10.55
Chi-squared test population estimate for cis-eQTL (genes:eQTL)                  0.865:0.135
Chi-squared test assumption for cis-eQTL: min number of [eQTL + genes] in bin   36.91
Calculated permutation threshold for all eQTL (eQTL/cM)                         27.55
Chi-squared test population estimate for trans-eQTL (genes:eQTL)                0.592:0.408
Chi-squared test assumption for trans-eQTL: min number of [eQTL + genes] in bin 12.24

Frequency plots of the eQTLs where significant hotspots are marked in red. The plot is generated using R and saved in pdf format.

  1. Hotspot genome-wide plot [three plots are generated: all eQTL, cis eQTL and trans eQTL]
  2. Hotspot chr plots - eQTL / cM [three plots are generated: all eQTL, cis eQTL and trans eQTL]