Mercurial > repos > siyuan > prada
comparison pyPRADA_1.2/tools/samtools-0.1.16/bcftools/README @ 0:acc2ca1a3ba4
Uploaded
| author | siyuan |
|---|---|
| date | Thu, 20 Feb 2014 00:44:58 -0500 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:acc2ca1a3ba4 |
|---|---|
| 1 The view command of bcftools calls variants, tests Hardy-Weinberg | |
| 2 equilibrium (HWE), tests allele balances and estimates allele frequency. | |
| 3 | |
| 4 This command calls a site as a potential variant if P(ref|D,F) is below | |
| 5 0.9 (controlled by the -p option), where D is data and F is the prior | |
| 6 allele frequency spectrum (AFS). | |
| 7 | |
| 8 The view command performs two types of allele balance tests, both based | |
| 9 on Fisher's exact test for 2x2 contingency tables with the row variable | |
| 10 being reference allele or not. In the first table, the column variable | |
| 11 is strand. Two-tail P-value is taken. We test if variant bases tend to | |
| 12 come from one strand. In the second table, the column variable is | |
| 13 whether a base appears in the first or the last 11bp of the read. | |
| 14 One-tail P-value is taken. We test if variant bases tend to occur | |
| 15 towards the end of reads, which is usually an indication of | |
| 16 misalignment. | |
| 17 | |
| 18 Site allele frequency is estimated in two ways. In the first way, the | |
| 19 frequency is esimated as \argmax_f P(D|f) under the assumption of | |
| 20 HWE. Prior AFS is not used. In the second way, the frequency is | |
| 21 estimated as the posterior expectation of allele counts \sum_k | |
| 22 kP(k|D,F), dividied by the total number of haplotypes. HWE is not | |
| 23 assumed, but the estimate depends on the prior AFS. The two estimates | |
| 24 largely agree when the signal is strong, but may differ greatly on weak | |
| 25 sites as in this case, the prior plays an important role. | |
| 26 | |
| 27 To test HWE, we calculate the posterior distribution of genotypes | |
| 28 (ref-hom, het and alt-hom). Chi-square test is performed. It is worth | |
| 29 noting that the model used here is prior dependent and assumes HWE, | |
| 30 which is different from both models for allele frequency estimate. The | |
| 31 new model actually yields a third estimate of site allele frequency. | |
| 32 | |
| 33 The estimate allele frequency spectrum is printed to stderr per 64k | |
| 34 sites. The estimate is in fact only the first round of a EM | |
| 35 procedure. The second model (not the model for HWE testing) is used to | |
| 36 estimate the AFS. |
