Dataset formats
The input dataset is VCF format. The output dataset is pgSnp. (Dataset missing?)
What it does
This converts a VCF dataset to pgSnp with the frequency counts being chromosome counts. If there is more than one column of SNP data it will either accumulate all columns as a population or convert the column indicated to pgSnp.
Examples
input:
1 13327 rs144762171 G C 100 PASS VT=SNP;SNPSOURCE=LOWCOV GT:DS:GL 0|0:0.000:-0.03,-1.11,-5.00 0|1:1.000:-1.97,-0.01,-2.51 0|0:0.050:-0.01,-1.69,-5.00 0|0:0.100:-0.48,-0.48,-0.48 1 13980 rs151276478 T C 100 PASS VT=SNP;SNPSOURCE=LOWCOV GT:DS:GL 0|0:0.100:-0.48,-0.48,-0.48 0|1:0.950:-0.48,-0.48,-0.48 0|0:0.050:-0.48,-0.48,-0.48 0|0:0.050:-0.48,-0.48,-0.48 1 30923 rs140337953 G T 100 PASS VT=SNP;SNPSOURCE=LOWCOV GT:DS:GL 1|1:1.950:-5.00,-0.61,-0.12 0|0:0.450:-0.10,-0.69,-2.81 0|0:0.450:-0.11,-0.64,-3.49 1|1:1.500:-0.48,-0.48,-0.48 etc.
output as a population:
chr1 13326 13327 G/C 2 7,1 0,0 chr1 13979 13980 T/C 2 7,1 0,0 chr1 30922 30923 G/T 2 4,4 0,0 etc.
output for each column separately:
chr1 13326 13327 G 1 2 0 G/C 2 1,1 0,0 G 1 2 0 G 1 2 0 chr1 13979 13980 T 1 2 0 T/C 2 1,1 0,0 T 1 2 0 T 1 2 0 chr1 30922 30923 T 1 2 0 G 1 2 0 G 1 2 0 T 1 2 0 etc.