Purpose
Beagle is a program for phasing and imputing missing genotypes. Sporadic missing genotypes are imputed during phasing. If a reference panel of phased genotypes is specified with the ref argument, ungenotyped markers that are present in the reference panel can also be imputed.
Beagle version 5.2 provides significantly faster genotype phasing than version 5.1. Recent versions of Beagle do not infer genotypes from genotype likelihood input data, but Beagle versions 4.0 and 4.1 have this capability.
HapMap genetic maps
HapMap genetic maps in PLINK format for GRCh36, GRCh37, and GRCh38 are available in this links
Input files
Beagle uses Variant Call Format (VCF) 4.3 for input and output genotype data. Pseuodoautosomal and non-pseudoautosomal X-chromosome genotypes must be in separate input files and analysed separately unless male haploid genotypes are coded as homozygous diploid genotypes.
In the VCF file, if any heterozygote genotype is unphased (with "/" allele separator) in a marker window, it will consider all heterozygote genotypes to be unphased, regardless of the allele separator used ("|" or "/"). Beagle assumes that an the VCF file has a name ending in ".gz" is compressed with gzip or bgzip, and that a reference VCF file that has a name ending in “.bref3” is compressed with bref version 3.
Output files
There are two output files. The log file gives a summary of the analysis that includes the Beagle version, the command line arguments, and compute time.
The vcf.gz file is a bgzip-compressed VCF file that contains phased, non-missing genotypes for all non-reference samples. The output vcf.gz file can be uncompressed with the unix gunzip utility.
If a reference panel is specified and ungenotyped markers are imputed, the VCF INFO field will contain:
- A "DR2" subfield with the estimated squared correlation between the estimated allele dose and the true allele dose. - An "AF" subfield with the estimated alternate allele frequencies in the target samples. - The "IMP" flag if the marker is imputed.