What it does
Identify the max number of eQTL expected by chance per cM using a permutation approach.
Eliminate differential gene density as an explanatory factor for eQTL hotspots, by performing a chi-squared test per bin.
Extract lists of eQTLs linked to each unbiased eQTL hotspot.
Genome wide eQTL freqeuncy plots.
Example input files
Frequency per bin file: frequency of all, cis and trans eQTLs and genes, each row correspond to a 2cM (or smaller) interval (11 columns; only a part of the file is shown):
int.id chr marker interval cM bp length_cM num.eQTL.all num.eQTL.cis num.eQTL.trans num.genes 1 1 1 0.0001 0.0 2038278 2.0 94.0 2.0 64.0 12.0 2 1 1 0.0201 2.0 2466324 2.0 17.0 3.0 11.0 34.0 3 1 1 0.0401 4.0 2894370 2.0 8.0 2.0 5.0 29.0 4 1 1 0.0601 6.0 3322416 1.53 11.0 5.0 3.0 10.0 5 1 2 0.0754 7.53 3649871 2.0 27.0 6.0 19.0 18.0 6 1 2 0.0954 9.53 4095673 2.0 8.0 4.0 3.0 12.0 7 1 2 0.1154 11.53 4541476 2.0 4.0 2.0 2.0 17.0 8 1 2 0.1354 13.53 4987278 2.0 8.0 4.0 2.0 19.0 9 1 2 0.1554 15.53 5433081 2.0 5.0 0 3.0 15.0
Sliding frequency file: Frequency of eQTLs and genes per sliding window interval (4 - 5.9 cM) output file, each row correspond to a sliding window interval (11 columns; only a part of the file is shown):
sliding.id chr sliding.cM sliding.all.eQTL sliding.cis.eQTL sliding.trans.eQTL sliding.genes sliding.all.eQTL/cM sliding.cis.eQTL/cM sliding.trans.eQTL/cM sliding.genes/cM 1 1 4.0 111.0 5.0 75.0 46.0 27.75 1.25 18.75 11.5 2 1 4.0 25.0 5.0 16.0 63.0 6.25 1.25 4.0 15.75 3 1 5.53 46.0 13.0 27.0 57.0 8.32 2.35 4.88 10.31 4 1 4.0 35.0 10.0 22.0 30.0 8.75 2.5 5.5 7.5 5 1 4.0 12.0 6.0 5.0 29.0 3.0 1.5 1.25 7.25 6 1 4.0 12.0 6.0 4.0 36.0 3.0 1.5 1.0 9.0 7 1 4.0 13.0 4.0 5.0 34.0 3.25 1.0 1.25 8.5 8 1 5.24 67.0 5.0 46.0 41.0 12.79 0.95 8.78 7.82 9 1 4.0 57.0 5.0 39.0 53.0 14.25 1.25 9.75 13.25 10 1 4.0 13.0 3.0 9.0 58.0 3.25 0.75 2.25 14.5 11 1 4.0 11.0 3.0 5.0 54.0 2.75 0.75 1.25 13.5 12 1 4.0 11.0 4.0 3.0 35.0 2.75 1.0 0.75 8.75
Frequency summary file:
Total number of eQTLs (all) 31549 Total number of cis-eQTLs 4863 Total number of trans-eQTLs 21428 Total number of genes 31036 Total number of cM 1861.57 Expected number of eQTL per cM (all) 16.95 Expected number of cis-eQTL per cM 2.61 Expected number of trans-eQTL per cM 11.51 Expected number of genes per cM 16.67 User specified number of permutations 1000 Number of intervals per sliding window 2
eQTL full classification file, each row correspond to an eQTL (16 columns; only a part of the file is shown). A classification column was added to the eQTL results file:
gene index chr start_marker start_int end_marker end_int peak_marker peak_int peakLR rsq rtsq parent_up_reg classification eQTL_bin gene_bin geneA 1 6 13 1.5139 15 1.6431 13 1.5539 12.7532485 0.1337606 0.3630217 parentA trans 691 800 geneC 2 9 5 0.8106 6 0.9614 6 0.9214 20.344489 0.1559524 0.3123026 parentB trans 902 700 geneC 3 9 8 1.2052 8 1.2452 8 1.2052 16.6822024 0.1244943 0.314542 parentA cis 917 920 geneD 4 9 1 0.0001 2 0.2395 1 0.1201 19.531317 0.1753893 0.4300621 parentA cis 860 862 geneH 5 1 1 0.0001 1 0.1001 1 0.0001 19.5727096 0.1373944 0.392982 parentB trans 939 465 geneH 6 1 9 1.0268 11 1.2164 10 1.1261 13.5560176 0.095168 0.4823061 parentB trans 1000 465 geneH 7 6 14 1.5977 15 1.8031 15 1.7231 19.8953622 0.3181244 0.3909106 parentB no_result 904 904 geneI 8 9 7 1.0982 9 1.3079 8 1.2052 20.3966235 0.1305025 0.4233788 parentA cis 977 969
eQTL cis classification file, each row correspond to a cis eQTL (16 columns; only a part of the file is shown):
gene index chr start_marker start_int end_marker end_int peak_marker peak_int peakLR rsq rtsq parent_up_reg classification eQTL_bin gene_bin geneC 3 9 8 1.2052 8 1.2452 8 1.2052 16.6822024 0.1244943 0.314542 parentA cis 917 920 geneD 4 9 1 0.0001 2 0.2395 1 0.1201 19.531317 0.1753893 0.4300621 parentA cis 860 862 geneI 8 9 7 1.0982 9 1.3079 8 1.2052 20.3966235 0.1305025 0.4233788 parentA cis 977 969
eQTL trans classification file, each row correspond to a trans eQTL (16 columns; only a part of the file is shown):
gene index chr start_marker start_int end_marker end_int peak_marker peak_int peakLR rsq rtsq parent_up_reg classification eQTL_bin gene_bin geneA 1 6 13 1.5139 15 1.6431 13 1.5539 12.7532485 0.1337606 0.3630217 parentA trans 691 800 geneC 2 9 5 0.8106 6 0.9614 6 0.9214 20.344489 0.1559524 0.3123026 parentB trans 902 700 geneH 5 1 1 0.0001 1 0.1001 1 0.0001 19.5727096 0.1373944 0.392982 parentB trans 939 465 geneH 6 1 9 1.0268 11 1.2164 10 1.1261 13.5560176 0.095168 0.4823061 parentB trans 1000 465
Map sliding window IDs (first column) to lookup table IDs (second column) (2 columns; only a part of the file is shown):
1 [1, 2] 2 [2, 3] 3 [3, 4, 5] 4 [5, 6] 5 [6, 7] 6 [7, 8] 7 [8, 9] 8 [9, 10, 11] 9 [11, 12] 10 [12, 13] 11 [13, 14] 12 [14, 15] 13 [15, 16] 14 [16, 17] 15 [17, 18] 16 [18, 19] 17 [19, 20, 21]
Example output files
Significant hotspots output file, rows correspond to the sliding intervals (10 columns; only a part of the file is shown). If 1 in “threshold.result” column, then more eQTLs than maximum expected by chance. If 1 in “chisq.result” column, then excess eQTLs compared with number of genes. Three output file like this are generated, for all, cis and trans eQTLs respectively:
sliding.id chr sliding.cM sliding.all.eQTL sliding.genes sliding.all.eQTL/cM threshold.result chisq.result significant hotspot.number 1 1 4.0 111.0 46.0 27.75 0 1 2 1 4.0 25.0 63.0 6.25 0 -1 3 1 5.53 46.0 57.0 8.32 0 0 4 1 4.0 35.0 30.0 8.75 0 0 5 1 4.0 12.0 29.0 3.0 0 0 6 1 4.0 12.0 36.0 3.0 0 0 7 1 4.0 13.0 34.0 3.25 0 0 8 1 5.24 67.0 41.0 12.79 0 0 9 1 4.0 57.0 53.0 14.25 0 0 10 1 4.0 13.0 58.0 3.25 0 -1 11 1 4.0 11.0 54.0 2.75 0 -1 12 1 4.0 11.0 35.0 2.75 0 0 13 1 4.0 8.0 22.0 2.0 0 0 14 1 4.0 10.0 26.0 2.5 0 0 15 1 4.0 17.0 26.0 4.25 0 0 16 1 4.0 24.0 25.0 6.0 0 0 17 1 5.94 226.0 59.0 38.05 1 1 * 1 18 1 4.0 205.0 28.0 51.25 1 1 * 1 19 1 5.91 89.0 55.0 15.06 0 0 20 1 4.0 47.0 64.0 11.75 0 0
eQTL/gene lists extracted for significant hotspots (Only the first 3 eQTLs linked to hotspots 1 - 5 are shown). The chromosome, list of IDs and number of eQTLs in each hotspot is given in the header. Three output file like this are generated, for all, cis and trans eQTLs respectively:
= = = = = Hotspot 1 chr 1 sliding.ids: [17, 18] int.ids: [19, 20, 21, 22] nr.eQTL: 257 = = = = = geneA 639 1 3 0.2878 4 0.3872 3 0.3478 13.7958496 0.110934 0.487661 parentB no_result 19 NA geneL 800 1 3 0.2478 4 0.4072 3 0.3478 24.2128991 0.2848178 0.4639009 parentB no_result 19 NA geneB 382 1 3 0.2878 4 0.3872 3 0.3478 13.7048724 0.1522281 0.3023807 parentB trans 19 757 geneD 457 1 3 0.2678 4 0.4072 3 0.3478 16.2210425 0.1537186 0.3527068 parentA trans 19 722 geneE 381 1 3 0.2678 4 0.4272 3 0.3478 19.2398655 0.1747831 0.4636225 parentA cis 19 16 … = = = = = Hotspot 2 chr 1 sliding.ids: [36, 37] int.ids: [43, 44, 45, 46] nr.eQTL: 268 = = = = = geneW 146 1 8 0.6998 9 0.9588 8 0.7798 17.6058658 0.168243 0.3517602 parentA cis 43 41 geneP 510 1 8 0.6998 8 0.7998 8 0.7798 48.9321454 0.6530789 0.7453719 parentB trans 43 566 geneF 231 1 8 0.7598 8 0.7998 8 0.7798 13.2268263 0.1715268 0.4169803 parentB trans 43 491 geneY 480 1 7 0.6922 8 0.7998 8 0.7798 71.8820179 0.7463132 0.8353116 parentB no_result 43 NA geneG 652 1 8 0.7798 8 0.7798 8 0.7798 11.5596194 0.1168083 0.3429812 parentB trans 43 753 geneJ 760 1 8 0.6998 9 0.9188 8 0.7798 22.0870242 0.2083328 0.396835 parentA cis 43 49 …
Full summary file (2 columns; 11 rows):
Total number of eQTLs (all) 31549 Total number of cis-eQTLs 4863 Total number of trans-eQTLs 21428 Total number of genes 31036 Total number of cM 1861.57 Expected number of eQTL per cM (all) 16.95 Expected number of cis-eQTL per cM 2.61 Expected number of trans-eQTL per cM 11.51 Expected number of genes per cM 16.67 User specified number of permutations 10 Number of intervals per sliding window 2 Calculated permutation threshold for all eQTL (eQTL/cM) 33.55 Chi-squared test population estimate for all eQTL (genes:eQTL) 0.496:0.504 Chi-squared test assumption for all eQTL: min number of [eQTL + genes] in bin 10.08 Calculated permutation threshold for all eQTL (eQTL/cM) 10.55 Chi-squared test population estimate for cis-eQTL (genes:eQTL) 0.865:0.135 Chi-squared test assumption for cis-eQTL: min number of [eQTL + genes] in bin 36.91 Calculated permutation threshold for all eQTL (eQTL/cM) 27.55 Chi-squared test population estimate for trans-eQTL (genes:eQTL) 0.592:0.408 Chi-squared test assumption for trans-eQTL: min number of [eQTL + genes] in bin 12.24
Frequency plots of the eQTLs where significant hotspots are marked in red. The plot is generated using R and saved in pdf format.