Galaxy | Tool Preview

Structure (version 2.3.4+galaxy1)
Note that the runs are sequential. Please launch separate runs if it's too long
mainparams
mainparams 0
extraparams
extraparams 0

Introduction

The program structure implements a model-based clustering method for inferring population structure using genotype data consisting of unlinked markers. The method was introduced in a paper by Pritchard, Stephens and Donnelly (2000a) and extended in sequels by Falush, Stephens and Pritchard (2003a, 2007). Applications of our method include demonstrating the presence of population structure, identifying distinct genetic populations, assigning individuals to populations, and identifying migrants and admixed individuals.

Briefly, we assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. It is assumed that within populations, the loci are at Hardy-Weinberg equilibrium, and linkage equilibrium. Loosely speaking, individuals are assigned to populations in such a way as to achieve this.

Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers including microsatellites, SNPs and RFLPs. The model assumes that markers are not in linkage disequilibrium (LD) within subpopulations, so we can’t handle markers that are extremely close together. Starting with version 2.0, we can now deal with weakly linked markers.

While the computational approaches implemented here are fairly powerful, some care is needed in running the program in order to ensure sensible answers. For example, it is not possible to determine suitable run-lengths theoretically, and this requires some experimentation on the part of the user. This document describes the use and interpretation of the software and supplements the published papers, which provide more formal descriptions and evaluations of the methods.

Documentation

Please see the full Sructure documentation

Upstream

Inputs can be produced from:

Input

             
George 1 -9 145 66 0 92
George 1 -9 -9 64 0 94
Paula 1 106 142 68 1 92
Paula 1 106 148 64 0 94
Matthew 2 110 145 -9 0 92
Matthew 2 110 148 66 1 -9
Bob 2 108 142 64 1 94
Bob 2 -9 142 -9 0 94
Anja 1 112 142 -9 1 -9
Anja 1 114 142 66 1 94
Peter 1 -9 145 66 0 -9
Peter 1 110 145 -9 1 -9
Carsten 2 108 145 62 0 -9
Carsten 2 110 145 64 1 92

You will find other sample data sets: here

Downstream