annotate average_fst.xml @ 18:f04f40a36cc8

Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
author Richard Burhans <burhans@bx.psu.edu>
date Tue, 23 Oct 2012 12:41:52 -0400
parents 8ae67e9fb6ff
children d6b961721037
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
18
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
1 <tool id="gd_average_fst" name="Overall FST" version="1.1.0">
14
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
2 <description>: Estimate the relative fixation index between two populations</description>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
3
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
4 <command interpreter="python">
18
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
5 average_fst.py "$input" "$p1_input" "$p2_input" "$data_source.ds_choice" "$data_source.min_value" "$discard_fixed" "$output"
14
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
6 #if $use_randomization.ur_choice == '1'
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
7 "$use_randomization.shuffles" "$use_randomization.p0_input"
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
8 #else
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
9 "0" "/dev/null"
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
10 #end if
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
11 #for $individual, $individual_col in zip($input.dataset.metadata.individual_names, $input.dataset.metadata.individual_columns)
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
12 #set $arg = '%s:%s' % ($individual_col, $individual)
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
13 "$arg"
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
14 #end for
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
15 </command>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
16
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
17 <inputs>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
18 <param name="input" type="data" format="gd_snp" label="SNP table" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
19 <param name="p1_input" type="data" format="gd_indivs" label="Population 1 individuals" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
20 <param name="p2_input" type="data" format="gd_indivs" label="Population 2 individuals" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
21
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
22 <conditional name="data_source">
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
23 <param name="ds_choice" type="select" format="integer" label="Data source">
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
24 <option value="0" selected="true">sequence coverage and ..</option>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
25 <option value="1">estimated genotype and ..</option>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
26 </param>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
27 <when value="0">
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
28 <param name="min_value" type="integer" min="1" value="1" label="Minimum total read count for a population" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
29 </when>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
30 <when value="1">
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
31 <param name="min_value" type="integer" min="1" value="1" label="Minimum individual genotype quality" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
32 </when>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
33 </conditional>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
34
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
35 <param name="discard_fixed" type="select" label="Apparently fixed SNPs">
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
36 <option value="0">Retain SNPs that appear fixed in the two populations</option>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
37 <option value="1" selected="true">Delete SNPs that appear fixed in the two populations</option>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
38 </param>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
39
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
40 <conditional name="use_randomization">
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
41 <param name="ur_choice" type="select" format="integer" label="Use randomization">
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
42 <option value="0" selected="true">No</option>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
43 <option value="1">Yes</option>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
44 </param>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
45 <when value="0" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
46 <when value="1">
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
47 <param name="shuffles" type="integer" min="0" value="0" label="Shuffles" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
48 <param name="p0_input" type="data" format="gd_indivs" label="Individuals for randomization" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
49 </when>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
50 </conditional>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
51 </inputs>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
52
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
53 <outputs>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
54 <data name="output" format="txt" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
55 </outputs>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
56
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
57 <tests>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
58 <test>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
59 <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
60 <param name="p1_input" value="test_in/a.gd_indivs" ftype="gd_indivs" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
61 <param name="p2_input" value="test_in/b.gd_indivs" ftype="gd_indivs" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
62 <param name="ds_choice" value="0" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
63 <param name="min_value" value="3" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
64 <param name="discard_fixed" value="1" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
65 <param name="ur_choice" value="0" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
66 <output name="output" file="test_out/average_fst/average_fst.txt" />
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
67 </test>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
68 </tests>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
69
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
70 <help>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
71
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
72 **What it does**
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
73
18
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
74 The user specifies a SNP table and two "populations" of individuals, both previously defined using the Galaxy tool to specify individuals from a SNP table. No individual can be in both populations. Other choices are as follows.
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
75
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
76 Data source. The allele frequencies of a SNP in the two populations can be estimated either by the total number of reads of each allele, or by adding the frequencies inferred from genotypes of individuals in the populations.
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
77
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
78 After specifying the data source, the user sets lower bounds on amount of data required at a SNP. For estimating the FST using read counts, the bound is the minimum count of reads of the two alleles in a population. For estimations based on genotype, the bound is the minimum reported genotype quality per individual. SMPs not meeting these lower bounds are ignored.
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
79
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
80 The user specifies whether SNPs where both populations appear to be fixed for the same allele should be retained or discarded.
14
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
81
18
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
82 Finally, the user decides whether to use randomizations. If so, then the user specifies how many randomly generated population pairs (retaining the numbers of individuals of the originals) to generate, as well as the "population" of additional individuals (not in the first two populations) that can be used in the randomization process.
14
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
83
18
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
84 The program prints the following measures of FST for the two populations.
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
85 1. The formulation by Sewall Wright (average over FSTs for all SNPs).
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
86 2. The Weir-Cockerham estimator (average over FSTs for all SNPs).
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
87 3. The Reich-Patterson estimator (average over FSTs for all SNPs).
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
88 4. The population-based Reich-Patterson estimator.
14
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
89
18
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
90 If randomizations were requested, it prints a summary for each of the four definitions of FST that includes the maximum and average value, and the highest-scoring population pair (if any scored higher than the two user-specified populations).
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
91
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
92 References:
14
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
93
18
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
94 Sewall Wright (1951) The genetical structure of populations. Ann Eugen 15:323-354.
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
95
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
96 B. S. Weir and C. Clark Cockerham (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.
14
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
97
18
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
98 Weir, B.S. 1996. Population substructure. Genetic data analysis II, pp. 161-173. Sinauer Associates, Sundand, MA.
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
99
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
100 David Reich, Kumarasamy Thangaraj, Nick Patterson, Alkes L. Price, and Lalji Singh (2009) Reconstructing Indian population history. Nature 461:489-494, especially Supplement 2.
14
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
101
18
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
102 Their effectiveness for computing FSTs when there are many SNPs but few individuals is discussed in the followoing paper.
14
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
103
18
f04f40a36cc8 Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
Richard Burhans <burhans@bx.psu.edu>
parents: 14
diff changeset
104 Eva-Maria Willing, Christine Dreyer, Cock van Oosterhout (2012) Estimates of genetic differentiation measured by FST do not necessarily require large sample sizes when using many SNP markers. PLoS One 7:e42649.
14
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
105 </help>
8ae67e9fb6ff Uploaded Miller Lab Devshed version a51c894f5bed again [possible toolshed.g2 bug]
miller-lab
parents:
diff changeset
106 </tool>