comparison prepare_population_structure.xml @ 18:f04f40a36cc8

Latest changes from Belinda and Cathy. Webb's updates to the Fst tools.
author Richard Burhans <burhans@bx.psu.edu>
date Tue, 23 Oct 2012 12:41:52 -0400
parents 8ae67e9fb6ff
children 248b06e86022
comparison
equal deleted inserted replaced
17:a3af29edcce2 18:f04f40a36cc8
17 #end for 17 #end for
18 </command> 18 </command>
19 19
20 <inputs> 20 <inputs>
21 <param name="input" type="data" format="gd_snp" label="SNP dataset" /> 21 <param name="input" type="data" format="gd_snp" label="SNP dataset" />
22 <param name="min_reads" type="integer" min="0" value="0" label="Minimum reads covering a SNP, per individual" />
23 <param name="min_qual" type="integer" min="0" value="0" label="Minimum quality value, per individual" />
24 <param name="min_spacing" type="integer" min="0" value="0" label="Minimum spacing between SNPs on the same scaffold" />
25 <conditional name="individuals"> 22 <conditional name="individuals">
26 <param name="choice" type="select" label="Individuals"> 23 <param name="choice" type="select" label="Individuals">
27 <option value="0" selected="true">All</option> 24 <option value="0" selected="true">All individuals</option>
28 <option value="1">Choose</option> 25 <option value="1">Specified populations</option>
29 </param> 26 </param>
30 <when value="0" /> 27 <when value="0" />
31 <when value="1"> 28 <when value="1">
32 <repeat name="populations" title="Population" min="1"> 29 <repeat name="populations" title="Population" min="1">
33 <param name="p_input" type="data" format="gd_indivs" label="Individuals" /> 30 <param name="p_input" type="data" format="gd_indivs" label="Individuals" />
34 </repeat> 31 </repeat>
35 </when> 32 </when>
36 </conditional> 33 </conditional>
34 <param name="min_reads" type="integer" min="0" value="0" label="Minimum SNP coverage" />
35 <param name="min_qual" type="integer" min="0" value="0" label="Minimum SNP quality" />
36 <param name="min_spacing" type="integer" min="0" value="0" label="Minimum spacing between SNPs" />
37 </inputs> 37 </inputs>
38 38
39 <outputs> 39 <outputs>
40 <data name="output" format="gd_ped"> 40 <data name="output" format="gd_ped">
41 <actions> 41 <actions>
60 60
61 <help> 61 <help>
62 62
63 **Dataset formats** 63 **Dataset formats**
64 64
65 The input datasets are in gd_snp_ and gd_indivs_ formats. It is important 65 The input datasets are in gd_snp_ and gd_indivs_ formats.
66 for the Individuals datasets to have unique names; rename them if 66 The output dataset is in gd_ped_ format. (`Dataset missing?`_)
67 necessary to make them unique. These names are used by the later tools in
68 the graphical displays.
69 The output dataset is gd_ped_. (`Dataset missing?`_)
70 67
71 .. _gd_snp: ./static/formatHelp.html#gd_snp 68 .. _gd_snp: ./static/formatHelp.html#gd_snp
72 .. _gd_indivs: ./static/formatHelp.html#gd_indivs 69 .. _gd_indivs: ./static/formatHelp.html#gd_indivs
73 .. _gd_ped: ./static/formatHelp.html#gd_ped 70 .. _gd_ped: ./static/formatHelp.html#gd_ped
74 .. _Dataset missing?: ./static/formatHelp.html 71 .. _Dataset missing?: ./static/formatHelp.html
75 72
76 ----- 73 -----
77 74
78 **What it does** 75 **What it does**
79 76
80 The tool converts a gd_snp dataset into two tables, called "admix.map" and 77 This tool converts a gd_snp dataset into the format needed for estimating
81 "admix.ped", needed for estimating the population structure. The user 78 the population structure. You can select the individuals to be included,
82 can read or download those files, or simply pass this tool's output on to 79 by using "population" datasets created via the Specify Individuals tool.
83 other programs. The user imposes conditions on which SNPs to consider, 80 (It is important for these population datasets to have distinguishable names,
84 such as the minimum coverage and/or quality value for every individual, 81 since they will be stored in the output's metadata so that subsequent tools
85 or the distance to the closest SNP in the same contig (as named in the 82 can use them as labels. If necessary, rename the datasets to give them
86 first column of the SNP table). A useful piece of information produced 83 distinct and meaningful names before running this tool.)
87 by the tool is the number of SNPs meeting those conditions, which can 84
88 be found by clicking on the eye icon in the history panel after the program 85 You can also filter the SNPs, based on criteria such as minimum coverage
89 runs. 86 (a qualifying SNP must have at least this many reads for every included
87 individual), minimum quality score (for every included individual), and/or
88 minimum spacing (SNPs that are too close together on the same chromosome or
89 scaffold are discarded). In addition to producing the filtered and formatted
90 .map and .ped files for subsequent analysis, the tool reports the number of
91 SNPs meeting these conditions, which can be seen by clicking on the eye icon
92 in the history panel after the program runs.
90 93
91 ----- 94 -----
92 95
93 **Example** 96 **Example**
94 97
95 - input:: 98 - input::
96 99
97 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0 100 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0
98 Contig48_chr1_10150253_10151311 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0 101 Contig48_chr1_10150253_10151311 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0
99 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0 102 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0
100 etc. 103 etc.
101 104
102 - output map file:: 105 - output cover page::
103 106
104 1 snp1 0 2 107 Prepare to look for population structure Galaxy Composite Dataset
105 1 snp3 0 4 108 Output completed: 2012-10-01 04:09:36 PM
106 1 snp4 0 5
107 1 snp5 0 6
108 1 snp6 0 7
109 1 snp7 0 8
110 1 snp8 0 9
111 1 snp9 0 10
112 109
113 - output ped file:: 110 Outputs
111 * admix.ped (link)
112 * admix.map (link)
113 * Using 222 of 400 SNPs
114 114
115 PB1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 115 Inputs
116 * Minimum reads covering a SNP, per individual: 6
117 * Minimum quality value, per individual: 0
118 * Minimum spacing between SNPs on the same scaffold: 0
119
120 Populations
121 * Pop. A
122 1. PB1
123 2. PB2
124 * Pop. B
125 1. PB3
126 2. PB4
127 * Pop. C
128 1. PB6
129 2. PB8
116 130
117 </help> 131 </help>
118 </tool> 132 </tool>