0
|
1 The view command of bcftools calls variants, tests Hardy-Weinberg
|
|
2 equilibrium (HWE), tests allele balances and estimates allele frequency.
|
|
3
|
|
4 This command calls a site as a potential variant if P(ref|D,F) is below
|
|
5 0.9 (controlled by the -p option), where D is data and F is the prior
|
|
6 allele frequency spectrum (AFS).
|
|
7
|
|
8 The view command performs two types of allele balance tests, both based
|
|
9 on Fisher's exact test for 2x2 contingency tables with the row variable
|
|
10 being reference allele or not. In the first table, the column variable
|
|
11 is strand. Two-tail P-value is taken. We test if variant bases tend to
|
|
12 come from one strand. In the second table, the column variable is
|
|
13 whether a base appears in the first or the last 11bp of the read.
|
|
14 One-tail P-value is taken. We test if variant bases tend to occur
|
|
15 towards the end of reads, which is usually an indication of
|
|
16 misalignment.
|
|
17
|
|
18 Site allele frequency is estimated in two ways. In the first way, the
|
|
19 frequency is esimated as \argmax_f P(D|f) under the assumption of
|
|
20 HWE. Prior AFS is not used. In the second way, the frequency is
|
|
21 estimated as the posterior expectation of allele counts \sum_k
|
|
22 kP(k|D,F), dividied by the total number of haplotypes. HWE is not
|
|
23 assumed, but the estimate depends on the prior AFS. The two estimates
|
|
24 largely agree when the signal is strong, but may differ greatly on weak
|
|
25 sites as in this case, the prior plays an important role.
|
|
26
|
|
27 To test HWE, we calculate the posterior distribution of genotypes
|
|
28 (ref-hom, het and alt-hom). Chi-square test is performed. It is worth
|
|
29 noting that the model used here is prior dependent and assumes HWE,
|
|
30 which is different from both models for allele frequency estimate. The
|
|
31 new model actually yields a third estimate of site allele frequency.
|
|
32
|
|
33 The estimate allele frequency spectrum is printed to stderr per 64k
|
|
34 sites. The estimate is in fact only the first round of a EM
|
|
35 procedure. The second model (not the model for HWE testing) is used to
|
|
36 estimate the AFS. |