annotate offspring_heterozygosity_pedigree.xml @ 32:03c22b722882

remove BeautifulSoup dependency
author Richard Burhans <burhans@bx.psu.edu>
date Fri, 20 Sep 2013 13:54:23 -0400
parents a631c2f6d913
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
31
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
1 <tool id="gd_offspring_heterozygosity_pedigree" name="Founders sequenced" version="1.0.0">
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
2 <description>: Offspring estimated heterozygosity from a pedigree with sequenced founders</description>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
3
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
4 <command interpreter="python">
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
5 #import json
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
6 #import base64
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
7 #import zlib
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
8 #set $ind_names = $input.dataset.metadata.individual_names
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
9 #set $ind_colms = $input.dataset.metadata.individual_columns
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
10 #set $ind_dict = dict(zip($ind_names, $ind_colms))
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
11 #set $ind_json = json.dumps($ind_dict, separators=(',',':'))
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
12 #set $ind_comp = zlib.compress($ind_json, 9)
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
13 #set $ind_arg = base64.b64encode($ind_comp)
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
14 offspring_heterozygosity_pedigree.py '$input' '$input.ext' '$pedigree' '$ind_arg' '$founders' '$b1_input' '$b2_input' '$output'
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
15 </command>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
16
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
17 <inputs>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
18 <param name="input" type="data" format="gd_snp,gd_genotype" label="SNP dataset" />
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
19 <param name="pedigree" type="data" format="txt" label="Pedigree dataset" />
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
20 <param name="founders" type="data" format="gd_indivs" label="Founders dataset" />
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
21 <param name="b1_input" type="data" format="txt" label="First breeders dataset" />
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
22 <param name="b2_input" type="data" format="txt" label="Second breeders dataset" />
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
23 </inputs>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
24
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
25 <outputs>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
26 <data name="output" format="txt" />
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
27 </outputs>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
28
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
29 <requirements>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
30 <requirement type="package" version="0.1">gd_c_tools</requirement>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
31 </requirements>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
32
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
33 <!--
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
34 <tests>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
35 </tests>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
36 -->
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
37
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
38 <help>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
39
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
40 **Dataset formats**
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
41
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
42 The input datasets are in gd_snp_, gd_genotype_, text_, and gd_indivs_ formats.
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
43 The output dataset is in text_ format.
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
44
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
45 .. _gd_snp: ./static/formatHelp.html#gd_snp
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
46 .. _gd_genotype: ./static/formatHelp.html#gd_genotype
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
47 .. _gd_indivs: ./static/formatHelp.html#gd_indivs
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
48 .. _text: ./static/formatHelp.html#text
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
49
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
50 -----
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
51
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
52 **What it does**
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
53
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
54 The user provides a Galaxy SNP table (gd_snp or gd_genotype format) that
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
55 includes the founders of a pedigree, as well as two sets of individuals.
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
56 The pedigree is specified by a text file with one row per individual,
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
57 containing (1) the individual's name, (2) the name of one of the
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
58 individual's parents, which must have occurred at the start of a previous
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
59 line, and (3) the name of the individual's other parent, which occurred at
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
60 the start of a previous line. For a pedigree founder, both parent names
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
61 are replaced by &quot;-&quot;. The founders are specified by a table in
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
62 gd_indivs format, e.g., as produced by &quot;Specify individuals&quot;
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
63 tool. Every founder must have genotypes supplied in the SNP table,
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
64 and both parents need to be given as &quot;-&quot; in the pedigree.
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
65 Conversely, every pedigree individual whose parents are &quot;-&quot;
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
66 must be named as a founder.
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
67
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
68 The user also provides two files that specify a set of names of
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
69 individuals. The first word on each line names an individual (one
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
70 line per individual); any subsequent information on a line is ignored.
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
71 The name of each individual must appear at the start of a line in the
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
72 pedigree.
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
73
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
74 For each pair of individuals, one from each specified set, the program
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
75 computes the expected heterozygosity of any offspring of the pair,
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
76 i.e., the probability that the offspring has distinct nucleotides at
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
77 a randomly chosen autosomal SNP. A SNP is ignorned if one or both
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
78 potential parents have an ancestor with undefined genotype (designated
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
79 as -1 in the SNP table).
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
80 </help>
a631c2f6d913 Update to Miller Lab devshed revision 3c4110ffacc3
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
81 </tool>