Mercurial > repos > miller-lab > genome_diversity
annotate filter_gd_snp.xml @ 22:95a05c1ef5d5
update to devshed revision aaece207bd01
author | Richard Burhans <burhans@bx.psu.edu> |
---|---|
date | Mon, 11 Mar 2013 11:28:06 -0400 |
parents | f04f40a36cc8 |
children | 8997f2ca8c7a |
rev | line source |
---|---|
22
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
1 <tool id="gd_filter_gd_snp" name="Filter SNPs" version="1.1.0"> |
13 | 2 <description>: Discard some SNPs based on coverage or quality</description> |
3 | |
4 <command interpreter="python"> | |
22
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
5 filter_gd_snp.py "$input" "$p1_input" "$output" "$lo_coverage" "$hi_coverage" "$low_ind_cov" "$lo_quality" |
13 | 6 #for $individual, $individual_col in zip($input.dataset.metadata.individual_names, $input.dataset.metadata.individual_columns) |
7 #set $arg = '%s:%s' % ($individual_col, $individual) | |
8 "$arg" | |
9 #end for | |
10 </command> | |
11 | |
12 <inputs> | |
13 <param name="input" type="data" format="gd_snp" label="SNP dataset" /> | |
14 <param name="p1_input" type="data" format="gd_indivs" label="Population individuals" /> | |
22
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
15 <param name="lo_coverage" type="text" value="0" label="Lower bound on total coverage"> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
16 <sanitizer> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
17 <valid initial="string.digits"> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
18 <!-- % is the percent (%) character --> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
19 <add value="%" /> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
20 </valid> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
21 </sanitizer> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
22 </param> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
23 <param name="hi_coverage" type="text" value="1000" label="Upper bound on total coverage"> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
24 <sanitizer> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
25 <valid initial="string.digits"> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
26 <!-- % is the percent (%) character --> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
27 <add value="%" /> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
28 </valid> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
29 </sanitizer> |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
30 </param> |
13 | 31 <param name="low_ind_cov" type="integer" min="0" value="0" label="Lower bound on individual coverage" /> |
32 <param name="lo_quality" type="integer" min="0" value="0" label="Lower bound on individual quality values" /> | |
33 </inputs> | |
34 | |
35 <outputs> | |
36 <data name="output" format="gd_snp" metadata_source="input" /> | |
37 </outputs> | |
38 | |
39 <tests> | |
40 <test> | |
41 <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" /> | |
42 <param name="p1_input" value="test_in/a.gd_indivs" ftype="gd_indivs" /> | |
43 <param name="lo_coverage" value="0" /> | |
44 <param name="hi_coverage" value="1000" /> | |
45 <param name="low_ind_cov" value="3" /> | |
46 <param name="lo_quality" value="30" /> | |
47 <output name="output" file="test_out/modify_snp_table/modify.gd_snp" /> | |
48 </test> | |
49 </tests> | |
50 | |
51 <help> | |
52 | |
53 **Dataset formats** | |
54 | |
55 The input datasets are in gd_snp_ and gd_indivs_ formats. | |
56 The output dataset is in gd_snp_ format. (`Dataset missing?`_) | |
57 | |
58 .. _gd_snp: ./static/formatHelp.html#gd_snp | |
59 .. _gd_indivs: ./static/formatHelp.html#gd_indivs | |
60 .. _Dataset missing?: ./static/formatHelp.html | |
61 | |
62 ----- | |
63 | |
64 **What it does** | |
65 | |
66 The user specifies that some of the individuals in a gd_snp dataset form a | |
67 "population", by supplying a list that has been previously created using the | |
68 Specify Individuals tool. SNPs are then discarded if their total coverage | |
69 for the population is too low or too high, or if their coverage or quality | |
70 score for any individual in the population is too low. | |
71 | |
22
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
72 The upper and lower bounds on total population coverage can be specified |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
73 either as read counts or as percentiles (e.g. "5%", with no decimal places). |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
74 For percentile bounds the SNPs are ranked by read count, so for example, a |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
75 lower bound of "10%" means that the least-covered 10% of the SNPs will be |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
76 discarded, while an upper bound of, say, "80%" will discard all SNPs above |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
77 the 80% mark, i.e. the top 20%. The threshold for the lower bound on |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
78 individual coverage can only be specified as a plain read count. |
95a05c1ef5d5
update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
18
diff
changeset
|
79 |
13 | 80 ----- |
81 | |
82 **Example** | |
83 | |
84 - input gd_snp:: | |
85 | |
86 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0 | |
87 Contig48_chr1_10150253_10151311 11 A G 94.3 chr1 10150264 A 1 0 2 30 1 0 2 30 1 0 2 30 3 0 2 36 1 0 2 30 1 0 2 30 Y 22 +99. 0 | |
88 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0 | |
89 etc. | |
90 | |
91 - input individuals:: | |
92 | |
93 9 PB1 | |
94 13 PB2 | |
95 17 PB3 | |
96 | |
97 - output when the lower bound on individual coverage is "3":: | |
98 | |
99 Contig161_chr1_4641264_4641879 115 C T 73.5 chr1 4641382 C 6 0 2 45 8 0 2 51 15 0 2 72 5 0 2 42 6 0 2 45 10 0 2 57 Y 54 0.323 0 | |
100 Contig20_chr1_21313469_21313570 66 C T 54.0 chr1 21313534 C 4 0 2 39 4 0 2 39 5 0 2 42 4 0 2 39 4 0 2 39 5 0 2 42 N 1 +99. 0 | |
101 etc. | |
102 | |
103 </help> | |
104 </tool> |