Dataset formats
The input dataset is in the MasterVar format provided by the Complete Genomics analysis process (Galaxy considers this to be tabular, but it must have the columns specified for MasterVar). The output dataset is a gd_snp table. (Dataset missing?)
What it does
This converts a Complete Genomics MasterVar file to gd_snp format, so it can be used with the genome diversity tools. It can either start a new dataset or append to an old one. When appending, if any new SNPs appear only in the MasterVar file they can either be skipped or backfilled with "-1" (unknown) for previous individuals/groups in the gd_snp dataset. Positions homozygous for the reference are skipped.
Examples
input MasterVar file:
934 2 chr1 41980 41981 hom snp A G G 76 97 dbsnp.86:rs806721 425 1 1 1 2 -170 ERVL-E-int:ERVL:47.4 2 1.17 N 935 2 chr1 41981 42198 hom ref = = = -170 1.17 N 1102 2 chr1 53205 53206 het-ref snp G C G 93 127 dbsnp.100:rs2854676 477 7 30 0 37 -127 2 1.17 N etc.
output:
chr1 41980 A G -1 0 1 0 76 chr1 53205 G C -1 30 7 1 93 etc.