What it does
The stats tool computes one of the following useful variant statistics for a GEMINI database:
Genotype counts tabulated by sample:
This mode uses the gemini stats --summarize option to produce a table with one row per sample, which tabulates the numbers of sites, for which a given sample shows a:
You can choose to calculate the table based on all variants in your database, or to filter the variants before the calculation using GEMINI genotype filter expressions and/or WHERE clauses of GEMINI queries.
Counts of SNPs by nucleotide change:
This runs gemini stats with the --snp-count option. The result is a simple table listing the number of occurences of each observed REF->ALT change in your database, e.g.:
type count A->G 2 C->T 1 G->A 1
Transition / transversion statistics
This mode uses gemini stats with the --tstv, --tstv-coding, or --tstv-noncoding option to compute the transition/transversion ratios for all SNPs, for SNPs in coding, or SNPs in non-coding regions, respectively.
The result is presented in a 1x3 table listing the number of transitions (ts column), transversions (tv column) and the ratio of the two (ts/tv column), e.g.:
ts tv ts/tv 126 39 3.2307
Alternate allele frequency spectrum
Runs gemini stats --sfs to produce binned alternate allele frequency counts in a table like:
aaf count 0.125 2 0.375 1
Pairwise genetic distances
Runs gemini stats --mds and tabulates all pairwise genetic distance for the samples in your database. An example could look like this:
sample1 sample2 distance M10500 M10500 0.0 M10475 M10478 1.25 M10500 M10475 2.0 M10500 M10478 0.5714