annotate PsiCLASS-1.0.2/samtools-0.1.19/bcftools/README @ 0:903fc43d6227 draft default tip

Uploaded
author lsong10
date Fri, 26 Mar 2021 16:52:45 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
1 The view command of bcftools calls variants, tests Hardy-Weinberg
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
2 equilibrium (HWE), tests allele balances and estimates allele frequency.
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
3
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
4 This command calls a site as a potential variant if P(ref|D,F) is below
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
5 0.9 (controlled by the -p option), where D is data and F is the prior
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
6 allele frequency spectrum (AFS).
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
7
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
8 The view command performs two types of allele balance tests, both based
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
9 on Fisher's exact test for 2x2 contingency tables with the row variable
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
10 being reference allele or not. In the first table, the column variable
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
11 is strand. Two-tail P-value is taken. We test if variant bases tend to
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
12 come from one strand. In the second table, the column variable is
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
13 whether a base appears in the first or the last 11bp of the read.
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
14 One-tail P-value is taken. We test if variant bases tend to occur
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
15 towards the end of reads, which is usually an indication of
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
16 misalignment.
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
17
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
18 Site allele frequency is estimated in two ways. In the first way, the
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
19 frequency is esimated as \argmax_f P(D|f) under the assumption of
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
20 HWE. Prior AFS is not used. In the second way, the frequency is
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
21 estimated as the posterior expectation of allele counts \sum_k
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
22 kP(k|D,F), dividied by the total number of haplotypes. HWE is not
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
23 assumed, but the estimate depends on the prior AFS. The two estimates
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
24 largely agree when the signal is strong, but may differ greatly on weak
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
25 sites as in this case, the prior plays an important role.
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
26
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
27 To test HWE, we calculate the posterior distribution of genotypes
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
28 (ref-hom, het and alt-hom). Chi-square test is performed. It is worth
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
29 noting that the model used here is prior dependent and assumes HWE,
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
30 which is different from both models for allele frequency estimate. The
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
31 new model actually yields a third estimate of site allele frequency.
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
32
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
33 The estimate allele frequency spectrum is printed to stderr per 64k
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
34 sites. The estimate is in fact only the first round of a EM
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
35 procedure. The second model (not the model for HWE testing) is used to
903fc43d6227 Uploaded
lsong10
parents:
diff changeset
36 estimate the AFS.