Galaxy | Tool Preview

Convert (version 1.0.0)

Dataset formats

The input dataset is formated as VCF, FSTAT, Genepop, or CSV, and is of Galaxy datatype text. Additionally, the name of the focus species (from which the SNPs in the VCF file were obtained) and a reference species are required. The output dataset is in gd_genotype or gd_snp format.

For input datasets in Genepop, FSTAT, or CSV formats, the program ignores population structures as well as alleles other than those encoded by 0, 1, and 2. For input datasets in FSTAT format the program accepts up to 9 digits and for Genepop files only 2 digits. Chromosome and position for each SNPs must be separated by a space or a tab. Ancestral loci must be encoded as 1, derived as 2 and missing as 0. In all cases ancestral and derived SNPs are returned as N. Alternatively, a dataset in CSV format can include nucleotides. In this case the ancestral nucleotide is defined as the most common allele.


What it does

This tool returns a gd_genotype dataset from VCF formatted files or three other conventional population genetics formats (i.e. FSTAT, Genepop, and CSV). For VCF files that include the fields allelic depth, genotype quality and genotype ("AD", "GQ", and "GT" respectively in the "FORMAT" field) the input dataset can be converted into a gd_snp file.


Examples