Dataset formats
The input dataset must be in lped format, and the output is tabular. (Dataset missing?)
What it does
GPASS (Genome-wide Poisson Approximation for Statistical Significance) detects significant single-SNP associations in case-control studies at a user-specified FDR. Unlike previous methods, this tool can accurately approximate the genome-wide significance and FDR of SNP associations, while adjusting for millions of multiple comparisons, within seconds or minutes.
The program has two main functionalities:
Detect significant single-SNP associations at a user-specified false discovery rate (FDR).
FDR = E(# of false positive SNPs / # of significant SNPs)
This definition however is very inappropriate for association mapping, since SNPs are highly correlated. Our FDR is defined differently to account for SNP correlations, and thus will obtain a proper FDR in terms of "proportion of false positive loci".
Approximate the significance of a list of candidate SNPs, adjusting for multiple comparisons. If you have isolated a few SNPs of interest and want to know their significance in a GWAS, you can supply the GWAS data and let the program specifically test those SNPs.
Also note: the number of SNPs in a study cannot be both too small and at the same time too clustered in a local region. A few hundreds of SNPs, or tens of SNPs spread in different regions, will be fine. The sample size cannot be too small either; around 100 or more individuals (case + control combined) will be fine. Otherwise use permutation.
Example
input map file:
1 rs0 0 738547 1 rs1 0 5597094 1 rs2 0 9424115 etc.
input ped file:
1 1 0 0 1 1 G G A A A A A A A A A G A A G G G G A A G G G G G G A A A A A G A A G G A G A G A A G G A A G G A A G G A G A A G G A A G G A A A G A G G G A G G G G G A A A G A A G G G G G G G G A G A A A A A A A A 1 1 0 0 1 1 G G A G G G A A A A A G A A G G G G G G A A G G A G A G G G G G A G G G A G A A G G A G G G A A G G G G A G A G G G A G A A A A G G G G A G A G G G A G A A A A A G G G A G G G A G G G G G A A G G A G etc.
output dataset, showing significant SNPs and their p-values and FDR:
#ID chr position Statistics adj-Pvalue FDR rs35 chr1 136606952 4.890849 0.991562 0.682138 rs36 chr1 137748344 4.931934 0.991562 0.795827 rs44 chr2 14423047 7.712832 0.665086 0.218776 etc.
Reference
Zhang Y, Liu JS. (2010) Fast and accurate significance approximation for genome-wide association studies. Submitted.