comparison aggregate_gd_indivs.xml @ 13:fdb4240fb565

Uploaded Miller Lab Devshed version a51c894f5bed
author miller-lab
date Fri, 28 Sep 2012 11:34:31 -0400
parents
children f04f40a36cc8
comparison
equal deleted inserted replaced
12:4b6590dd7250 13:fdb4240fb565
1 <tool id="gd_sum_gd_snp" name="Aggregate Individuals" version="1.0.0">
2 <description>: Append summary columns for a population</description>
3
4 <command interpreter="python">
5 modify_snp_table.py "$input" "$p1_input" "$output" "-1" "-1" "-1" "-1"
6 #for $individual, $individual_col in zip($input.dataset.metadata.individual_names, $input.dataset.metadata.individual_columns)
7 #set $arg = '%s:%s' % ($individual_col, $individual)
8 "$arg"
9 #end for
10 </command>
11
12 <inputs>
13 <param name="input" type="data" format="gd_snp" label="SNP dataset" />
14 <param name="p1_input" type="data" format="gd_indivs" label="Population individuals" />
15 </inputs>
16
17 <outputs>
18 <data name="output" format="gd_snp" metadata_source="input" />
19 </outputs>
20
21 <tests>
22 <test>
23 <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" />
24 <param name="p1_input" value="test_in/a.gd_indivs" ftype="gd_indivs" />
25 <param name="choice" value="1" />
26 <param name="lo_coverage" value="0" />
27 <param name="hi_coverage" value="1000" />
28 <param name="low_ind_cov" value="3" />
29 <param name="lo_quality" value="30" />
30 <output name="output" file="test_out/modify_snp_table/modify.gd_snp" />
31 </test>
32 </tests>
33
34 <help>
35
36 **Dataset formats**
37
38 The input datasets are in gd_snp_ and gd_indivs_ formats.
39 The output dataset is in gd_snp_ format. (`Dataset missing?`_)
40
41 .. _gd_snp: ./static/formatHelp.html#gd_snp
42 .. _gd_indivs: ./static/formatHelp.html#gd_indivs
43 .. _Dataset missing?: ./static/formatHelp.html
44
45 -----
46
47 **What it does**
48
49 The user specifies that some of the individuals in a gd_snp dataset form a
50 "population", by supplying a list that has been previously created using the
51 Specify Individuals tool. The program appends a
52 new "entity" (set of four columns) to the gd_snp table, analogous to the columns
53 for an individual but containing summary data for the population as a group.
54 These four columns give the total counts for the two alleles, the "genotype" for
55 the population, and the maximum quality value, taken over all individuals in the
56 population. If all defined genotypes in the population are 2 (agree with the
57 reference), then the population's genotype is 2, and similarly for 0; otherwise
58 the genotype is 1 (unless all individuals have undefined genotype, in which case
59 it is -1).
60
61 -----
62
63 **Example**
64
65 - input gd_snp::
66
67 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0
68 Contig48_chr1_10150253_10151311 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0
69 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0
70 etc.
71
72 - input individuals::
73
74 9 PB1
75 13 PB2
76 17 PB3
77
78 - output::
79
80 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0 29 0 2 72
81 Contig48_chr1_10150253_10151311 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0 3 0 2 30
82 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0 13 0 2 42
83 etc.
84
85 </help>
86 </tool>