Mercurial > repos > miller-lab > genome_diversity
annotate aggregate_gd_indivs.xml @ 38:9d0b1fa77047
Changed simplejson to json
author | Richard Burhans <burhans@bx.psu.edu> |
---|---|
date | Fri, 28 Feb 2014 12:15:32 -0500 |
parents | a631c2f6d913 |
children |
rev | line source |
---|---|
26
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
1 <tool id="gd_sum_gd_snp" name="Aggregate Individuals" version="1.1.0"> |
13 | 2 <description>: Append summary columns for a population</description> |
3 | |
4 <command interpreter="python"> | |
27
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
5 #import json |
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
6 #import base64 |
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
7 #import zlib |
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
8 #set $ind_names = $input.dataset.metadata.individual_names |
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
9 #set $ind_colms = $input.dataset.metadata.individual_columns |
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
10 #set $ind_dict = dict(zip($ind_names, $ind_colms)) |
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
11 #set $ind_json = json.dumps($ind_dict, separators=(',',':')) |
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
12 #set $ind_comp = zlib.compress($ind_json, 9) |
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
13 #set $ind_arg = base64.b64encode($ind_comp) |
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
14 aggregate_gd_indivs.py '$input' '$p1_input' '$output' |
26
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
15 #if $input_type.choice == '0' |
27
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
16 'gd_snp' |
26
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
17 #else if $input_type.choice == '1' |
27
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
18 'gd_genotype' |
26
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
19 #end if |
27
8997f2ca8c7a
Update to Miller Lab devshed revision bae0d3306d3b
Richard Burhans <burhans@bx.psu.edu>
parents:
26
diff
changeset
|
20 '$ind_arg' |
13 | 21 </command> |
22 | |
23 <inputs> | |
26
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
24 |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
25 <conditional name="input_type"> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
26 <param name="choice" type="select" format="integer" label="Input format"> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
27 <option value="0" selected="true">gd_snp</option> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
28 <option value="1">gd_genotype</option> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
29 </param> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
30 |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
31 <when value="0"> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
32 <param name="input" type="data" format="gd_snp" label="SNP dataset" /> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
33 </when> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
34 <when value="1"> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
35 <param name="input" type="data" format="gd_genotype" label="Genotype dataset" /> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
36 </when> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
37 </conditional> |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
38 |
13 | 39 <param name="p1_input" type="data" format="gd_indivs" label="Population individuals" /> |
40 </inputs> | |
41 | |
42 <outputs> | |
26
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
43 <data name="output" format="input" format_source="input" metadata_source="input" /> |
13 | 44 </outputs> |
45 | |
31
a631c2f6d913
Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
27
diff
changeset
|
46 <requirements> |
a631c2f6d913
Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
27
diff
changeset
|
47 <requirement type="package" version="0.1">gd_c_tools</requirement> |
a631c2f6d913
Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
27
diff
changeset
|
48 </requirements> |
a631c2f6d913
Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
27
diff
changeset
|
49 |
13 | 50 <tests> |
51 <test> | |
52 <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" /> | |
53 <param name="p1_input" value="test_in/a.gd_indivs" ftype="gd_indivs" /> | |
54 <output name="output" file="test_out/modify_snp_table/modify.gd_snp" /> | |
55 </test> | |
56 </tests> | |
57 | |
58 <help> | |
59 | |
60 **Dataset formats** | |
61 | |
26
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
62 The input datasets are in gd_snp_, gd_genotype_, and gd_indivs_ formats. |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
63 The output dataset is in gd_snp_ or gd_genotype_ format. (`Dataset missing?`_) |
13 | 64 |
65 .. _gd_snp: ./static/formatHelp.html#gd_snp | |
26
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
66 .. _gd_genotype: ./static/formatHelp.html#gd_genotype |
13 | 67 .. _gd_indivs: ./static/formatHelp.html#gd_indivs |
68 .. _Dataset missing?: ./static/formatHelp.html | |
69 | |
70 ----- | |
71 | |
72 **What it does** | |
73 | |
26
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
74 The user specifies that some of the individuals in a gd_snp or gd_genotype |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
75 dataset form a "population", by supplying a list that has been previously |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
76 created using the Specify Individuals tool. The program appends a new |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
77 "entity" (set of four columns for a gd_snp table, or one column for a |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
78 gd_genotype table), analogous to the column(s) for an individual but |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
79 containing summary data for the population as a group. For a gd_snp |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
80 table, these four columns give the total counts for the two alleles, |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
81 the "genotype" for the population, and the maximum quality value, taken |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
82 over all individuals in the population. If all defined genotypes in |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
83 the population are 2 (agree with the reference), then the population's |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
84 genotype is 2, and similarly for 0; otherwise the genotype is 1 (unless |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
85 all individuals have undefined genotype, in which case it is -1). |
91e835060ad2
Updates to Admixture, Aggregate Individuals, and Restore Attributes to support gd_genotype
Richard Burhans <burhans@bx.psu.edu>
parents:
22
diff
changeset
|
86 For a gd_genotype file, only the aggregate genotype is appended. |
13 | 87 |
88 ----- | |
89 | |
90 **Example** | |
91 | |
92 - input gd_snp:: | |
93 | |
94 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0 | |
95 Contig48_chr1_10150253_10151311 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0 | |
96 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0 | |
97 etc. | |
98 | |
99 - input individuals:: | |
100 | |
101 9 PB1 | |
102 13 PB2 | |
103 17 PB3 | |
104 | |
105 - output:: | |
106 | |
107 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0 29 0 2 72 | |
108 Contig48_chr1_10150253_10151311 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0 3 0 2 30 | |
109 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0 13 0 2 42 | |
110 etc. | |
111 | |
112 </help> | |
113 </tool> |