Mercurial > repos > fubar > bigwig_outlier_bed
comparison bigwig_outlier_bed.xml @ 6:eb17eb8a3658 draft
planemo upload commit 1baff96e75def9248afdcf21edec9bdc7ed42b1f-dirty
author | fubar |
---|---|
date | Tue, 23 Jul 2024 23:12:23 +0000 |
parents | 68cb8e7e266b |
children | c8e22efcaeda |
comparison
equal
deleted
inserted
replaced
5:68cb8e7e266b | 6:eb17eb8a3658 |
---|---|
1 <tool name="bigwig_outlier_bed" id="bigwig_outlier_bed" version="0.04" profile="22.05"> | 1 <tool name="Bigwig extremes to bed features" id="bigwig_outlier_bed" version="@TOOL_VERSION@" profile="22.05"> |
2 <!--Source in git at: https://github.com/fubar2/galaxy_tf_overlay--> | 2 <description>Writes high and low bigwig runs as features in a bed file</description> |
3 <!--Created by toolfactory@galaxy.org at 30/06/2024 19:44:14 using the Galaxy Tool Factory.--> | 3 <macros> |
4 <description>Writes high and low bigwig regions as features in a bed file</description> | 4 <token name="@TOOL_VERSION@">0.2.0</token> |
5 <token name="@NUMPY_VERSION@">2.0.0</token> | |
6 <token name="@PYTHON_VERSION@">3.12.3</token> | |
7 </macros> | |
5 <edam_topics> | 8 <edam_topics> |
6 <edam_topic>topic_0157</edam_topic> | 9 <edam_topic>topic_0157</edam_topic> |
7 <edam_topic>topic_0092</edam_topic> | 10 <edam_topic>topic_0092</edam_topic> |
8 </edam_topics> | 11 </edam_topics> |
9 <edam_operations> | 12 <edam_operations> |
10 <edam_operation>operation_0337</edam_operation> | 13 <edam_operation>operation_0337</edam_operation> |
11 </edam_operations> | 14 </edam_operations> |
15 <xrefs> | |
16 <xref type="bio.tools">bigtools</xref> | |
17 </xrefs> | |
12 <requirements> | 18 <requirements> |
13 <requirement type="package" version="3.12.3">python</requirement> | 19 <requirement type="package" version="@PYTHON_VERSION@">python</requirement> |
14 <requirement type="package" version="2.0.0">numpy</requirement> | 20 <requirement type="package" version="@NUMPY_VERSION@">numpy</requirement> |
15 <requirement type="package" version="0.1.4">pybigtools</requirement> | 21 <requirement type="package" version="@TOOL_VERSION@">pybigtools</requirement> |
16 </requirements> | 22 </requirements> |
23 <required_files> | |
24 <include path="bigwig_outlier_bed.py"/> | |
25 </required_files> | |
17 <version_command><![CDATA[python -c "import pybigtools; from importlib.metadata import version; print(version('pybigtools'))"]]></version_command> | 26 <version_command><![CDATA[python -c "import pybigtools; from importlib.metadata import version; print(version('pybigtools'))"]]></version_command> |
18 <command><![CDATA[python | 27 <command><![CDATA[python '${__tool_directory__}/bigwig_outlier_bed.py' |
19 '$runme' | 28 --bigwig |
20 --bigwig | 29 #for bw in $bigwig: |
21 '$bigwig' | 30 '$bw' |
22 --bedouthilo | 31 #end for |
23 '$bedouthilo' | |
24 --minwin | |
25 '$minwin' | |
26 --qhi | |
27 '$qhi' | |
28 --qlo | |
29 '$qlo' | |
30 #if $tableout == "set" | |
31 --tableout | |
32 #end if | |
33 --bigwiglabels | 32 --bigwiglabels |
34 '$bigwiglabels']]></command> | 33 #for bw in $bigwig: |
35 <configfiles> | 34 '$bw.name' |
36 <configfile name="runme"><![CDATA[#raw | 35 #end for |
37 """ | 36 --outbeds '$outbeds' |
38 Bigwigs are great, but hard to reliably "see" small low coverage or small very high coverage regions. | 37 #if $outbeds in ['outhilo', 'outall']: |
39 Colouring in JB2 tracks will need a new plugin, so this code will find bigwig regions above and below a chosen percentile point. | 38 --bedouthilo '$bedouthilo' |
40 0.99 and 0.01 work well in testing with a minimum span of 10 bp. | 39 #end if |
41 Multiple bigwigs **with the same reference** can be combined - bed segments will be named appropriately | 40 #if $outbeds in ['outhi', 'outall', 'outlohi']: |
42 Combining multiple references works but is silly because display will rely on one reference so features mapped to other references will not appear. | 41 --bedouthi '$bedouthi' |
43 | 42 #end if |
44 Tricksy numpy method from http://gregoryzynda.com/python/numpy/contiguous/interval/2019/11/29/contiguous-regions.html | 43 #if $outbeds in ['outlo', 'outall', 'outlohi']: |
45 takes about 95 seconds for a 17MB test wiggle | 44 --bedoutlo '$bedoutlo' |
46 JBrowse2 bed normally displays ignore the score, so could provide separate low/high bed file outputs as an option. | 45 #end if |
47 Update june 30 2024: wrote a 'no-build' plugin for beds to display red/blue if >0/<0 so those are used for scores | 46 --minwin '$minwin' |
48 Bed interval naming must be short for JB2 but needs input bigwig name and (lo or hi). | 47 #if $qhi: |
49 """ | 48 --qhi '$qhi' |
50 | 49 #end if |
51 import argparse | 50 #if $qlo: |
52 import numpy as np | 51 --qlo '$qlo' |
53 import pybigtools | 52 #end if |
54 import sys | 53 #if $tableout == "create" or $outbeds == "outtab": |
55 from pathlib import Path | 54 --tableoutfile '$tableoutfile' |
56 | 55 #end if |
57 | 56 ]]></command> |
58 class findOut(): | |
59 def __init__(self, args): | |
60 self.bwnames=args.bigwig | |
61 self.bwlabels=args.bigwiglabels | |
62 self.bedwin=args.minwin | |
63 self.qlo=args.qlo | |
64 self.qhi=args.qhi | |
65 self.bedouthilo=args.bedouthilo | |
66 self.bedouthi=args.bedouthi | |
67 self.bedoutlo=args.bedoutlo | |
68 self.tableout = args.tableout | |
69 self.bedwin = args.minwin | |
70 self.qhi = args.qhi | |
71 self.qlo = args.qlo | |
72 self.makeBed() | |
73 | |
74 def processVals(self, bw, isTop): | |
75 # http://gregoryzynda.com/python/numpy/contiguous/interval/2019/11/29/contiguous-regions.html | |
76 if isTop: | |
77 bwex = np.r_[False, bw >= self.bwtop, False] # extend with 0s | |
78 else: | |
79 bwex = np.r_[False, bw <= self.bwbot, False] | |
80 bwexd = np.diff(bwex) | |
81 bwexdnz = bwexd.nonzero()[0] | |
82 bwregions = np.reshape(bwexdnz, (-1,2)) | |
83 return bwregions | |
84 | |
85 def writeBed(self, bed, bedfname): | |
86 """ | |
87 potentially multiple | |
88 """ | |
89 bed.sort() | |
90 beds = ['%s\t%d\t%d\t%s\t%d' % x for x in bed] | |
91 with open(bedfname, "w") as bedf: | |
92 bedf.write('\n'.join(beds)) | |
93 bedf.write('\n') | |
94 print('Wrote %d bed regions to %s' % (len(bed), bedfname)) | |
95 | |
96 def makeBed(self): | |
97 bedhi = [] | |
98 bedlo = [] | |
99 bwlabels = self.bwlabels | |
100 bwnames = self.bwnames | |
101 print('bwnames=', bwnames, "bwlabs=", bwlabels) | |
102 for i, bwname in enumerate(bwnames): | |
103 bwlabel = bwlabels[i].replace(" ",'') | |
104 p = Path('in.bw') | |
105 p.symlink_to( bwname ) # required by pybigtools (!) | |
106 bwf = pybigtools.open('in.bw') | |
107 chrlist = bwf.chroms() | |
108 chrs = list(chrlist.keys()) | |
109 chrs.sort() | |
110 restab = ["contig\tn\tmean\tstd\tmin\tmax\tqtop\tqbot"] | |
111 for chr in chrs: | |
112 bw = bwf.values(chr) | |
113 bw = bw[~np.isnan(bw)] # some have NaN if parts of a contig not covered | |
114 if self.qhi is not None: | |
115 self.bwtop = np.quantile(bw, self.qhi) | |
116 bwhi = self.processVals(bw, isTop=True) | |
117 for i, seg in enumerate(bwhi): | |
118 if seg[1] - seg[0] >= self.bedwin: | |
119 bedhi.append((chr, seg[0], seg[1], '%s_hi' % (bwlabel), 1)) | |
120 if self.qlo is not None: | |
121 self.bwbot = np.quantile(bw, self.qlo) | |
122 bwlo = self.processVals(bw, isTop=False) | |
123 for i, seg in enumerate(bwlo): | |
124 if seg[1] - seg[0] >= self.bedwin: | |
125 bedlo.append((chr, seg[0], seg[1], '%s_lo' % (bwlabel), -1)) | |
126 bwmean = np.mean(bw) | |
127 bwstd = np.std(bw) | |
128 bwmax = np.max(bw) | |
129 nrow = np.size(bw) | |
130 bwmin = np.min(bw) | |
131 restab.append('%s\t%d\t%f\t%f\t%f\t%f\t%f\t%f' % (chr,nrow,bwmean,bwstd,bwmin,bwmax,self.bwtop,self.bwbot)) | |
132 print('\n'.join(restab), '\n') | |
133 if self.tableout: | |
134 with open(self.tableout) as t: | |
135 t.write('\n'.join(restab)) | |
136 t.write('\n') | |
137 if self.bedoutlo: | |
138 if self.qlo: | |
139 self.writeBed(bedlo, self.bedoutlo) | |
140 if self.bedouthi: | |
141 if self.qhi: | |
142 self.writeBed(bedhi, self.bedouthi) | |
143 if self.bedouthilo: | |
144 allbed = bedlo + bedhi | |
145 self.writeBed(allbed, self.bedouthilo) | |
146 return restab | |
147 | |
148 | |
149 if __name__ == "__main__": | |
150 parser = argparse.ArgumentParser() | |
151 a = parser.add_argument | |
152 a('-m', '--minwin',default=10, type=int) | |
153 a('-l', '--qlo',default=None, type=float) | |
154 a('-i', '--qhi',default=None, type=float) | |
155 a('-w', '--bigwig', nargs='+') | |
156 a('-n', '--bigwiglabels', nargs='+') | |
157 a('-o', '--bedouthilo', default=None, help="optional high and low combined bed") | |
158 a('-u', '--bedouthi', default=None, help="optional high only bed") | |
159 a('-b', '--bedoutlo', default=None, help="optional low only bed") | |
160 a('-t', '--tableout', default=None) | |
161 args = parser.parse_args() | |
162 print('args=', args) | |
163 if not (args.bedouthilo or args.bedouthi or args.bedoutlo): | |
164 sys.stderr.write("bigwig_outlier_bed.py cannot usefully run - need a bed output choice - must be one of low only, high only or both combined") | |
165 sys.exit(2) | |
166 if not (args.qlo or args.qhi): | |
167 sys.stderr.write("bigwig_outlier_bed.py cannot usefully run - need one or both of quantile cutpoints qhi and qlo") | |
168 sys.exit(2) | |
169 restab = findOut(args) | |
170 if args.tableout: | |
171 with open(args.tableout, 'w') as tout: | |
172 tout.write('\n'.join(restab)) | |
173 tout.write('\n') | |
174 #end raw]]></configfile> | |
175 </configfiles> | |
176 <inputs> | 57 <inputs> |
177 <param name="bigwig" type="data" optional="false" label="Bigwig file(s) to process. " help="If more than one, MUST all use the same reference sequence to be displayable. Feature names will include the bigwig label." format="bigwig" multiple="true"/> | 58 <param name="bigwig" type="data" optional="false" label="Choose one or more bigwig file(s) to return outlier regions as a bed file" |
178 <param name="minwin" type="integer" value="10" label="Minimum continuous bases to count as a high or low bed feature" help="Actual run length will be found and used for continuous features as long or longer."/> | 59 help="If more than one, MUST all use the same reference sequence to be displayable. Feature names will include the bigwig label." format="bigwig" multiple="true"/> |
179 <param name="qhi" type="float" value="0.99" label="Quantile cutoff for a high region - 0.99 will cut off at or above the 99th percentile" help=""/> | 60 <param name="minwin" type="integer" value="10" label="Minimum continuous bases to count as a high or low bed feature" |
180 <param name="qlo" type="float" value="0.01" label="Quantile cutoff for a low region - 0.01 will cut off at or below the 1st percentile." help=""/> | 61 help="Continuous features as long or longer than this window size will appear as bed features"/> |
181 <param name="tableout" type="select" label="Write a table showing contig statistics for each bigwig" help="" display="radio"> | 62 <param name="qhi" type="float" value="0.99" label="Quantile cutoff for a high region - 0.99 will cut off at or above the 99th percentile" help="Required" optional="false"/> |
182 <option value="notset">Do not set this flag</option> | 63 <param name="qlo" type="float" value="0.01" label="Quantile cutoff for a low region - 0.01 will cut off at or below the 1st percentile." help="Optional" optional="true"/> |
183 <option value="set">Set this flag</option> | 64 <param name="outbeds" type="select" label="Select the required bed file outputs" help="Any combination of the 3 different kinds of bed file output can be made"> |
65 <option value="outhilo" selected="true">Make 1 bed output with both low and high regions</option> | |
66 <option value="outhi">Make 1 bed output with high regions only</option> | |
67 <option value="outlo">Make 1 bed output with low regions only</option> | |
68 <option value="outall">Make 3 bed outputs with low and high together in one, high in one and low in the other</option> | |
69 <option value="outlohi">Make 2 bed outputs with high in one and low in the other</option> | |
70 <option value="outtab">NO bed outputs. Report bigwig value distribution only</option> | |
184 </param> | 71 </param> |
185 <param name="bigwiglabels" type="text" value="outbed" label="Label to use in bed feature names to indicate source bigwig contents - such as coverage" help=""/> | 72 <param name="tableout" type="select" label="Write a table showing contig statistics for each bigwig input" help=""> |
73 <option value="donotmake">Do not create this report</option> | |
74 <option value="create" selected="true">Create this report</option> | |
75 </param> | |
186 </inputs> | 76 </inputs> |
187 <outputs> | 77 <outputs> |
188 <data name="bedouthilo" format="bed" label="Both high and low contiguous regions as long or longer than window length into one bed " hidden="false"/> | 78 <data name="bedouthilo" format="bed" label="High_and_low_bed" hidden="false"> |
79 <filter>outbeds in ["outall", "outhilo"]</filter> | |
80 </data> | |
81 <data name="bedouthi" format="bed" label="High bed" hidden="false"> | |
82 <filter>outbeds in ["outall", "outlohi", "outhi"]</filter> | |
83 </data> | |
84 <data name="bedoutlo" format="bed" label="Low bed" hidden="false"> | |
85 <filter>outbeds in ["outall", "outlohi", "outlo"]</filter> | |
86 </data> | |
87 <data name="tableoutfile" format="tabular" label="Contig statistics" hidden="false"> | |
88 <filter>tableout == "create"</filter> | |
89 </data> | |
189 </outputs> | 90 </outputs> |
190 <tests> | 91 <tests> |
191 <test> | 92 <test expect_num_outputs="1"> |
192 <output name="bedouthilo" value="bedouthilo_sample" compare="diff" lines_diff="0"/> | 93 <output name="bedouthilo" value="bedouthilo_sample" compare="diff" lines_diff="0"/> |
94 <param name="outbeds" value="outhilo"/> | |
193 <param name="bigwig" value="bigwig_sample"/> | 95 <param name="bigwig" value="bigwig_sample"/> |
194 <param name="minwin" value="10"/> | 96 <param name="minwin" value="10"/> |
195 <param name="qhi" value="0.99"/> | 97 <param name="qhi" value="0.99"/> |
196 <param name="qlo" value="0.01"/> | 98 <param name="qlo" value="0.01"/> |
197 <param name="tableout" value="notset"/> | 99 <param name="tableout" value="donotmake"/> |
198 <param name="bigwiglabels" value="outbed"/> | 100 </test> |
101 <test expect_num_outputs="1"> | |
102 <output name="tableoutfile" value="table_only_sample" compare="diff" lines_diff="0"/> | |
103 <param name="outbeds" value="outtab"/> | |
104 <param name="bigwig" value="bigwig_sample,1.bigwig"/> | |
105 <param name="minwin" value="10"/> | |
106 <param name="qhi" value="0.99"/> | |
107 <param name="qlo" value="0.01"/> | |
108 <param name="tableout" value="create"/> | |
109 </test> | |
110 <test expect_num_outputs="2"> | |
111 <output name="bedouthilo" value="bedouthilo_sample" compare="diff" lines_diff="0"/> | |
112 <output name="tableoutfile" value="table_sample" compare="diff" lines_diff="0"/> | |
113 <param name="outbeds" value="outhilo"/> | |
114 <param name="bigwig" value="bigwig_sample"/> | |
115 <param name="minwin" value="10"/> | |
116 <param name="qhi" value="0.99"/> | |
117 <param name="qlo" value="0.01"/> | |
118 <param name="tableout" value="create"/> | |
119 </test> | |
120 <test expect_num_outputs="2"> | |
121 <output name="bedouthi" value="bedouthi_qlo_notset_sample" compare="diff" lines_diff="0"/> | |
122 <output name="tableoutfile" value="table_qlo_notset_sample" compare="diff" lines_diff="0"/> | |
123 <param name="outbeds" value="outhi"/> | |
124 <param name="bigwig" value="bigwig_sample"/> | |
125 <param name="minwin" value="10"/> | |
126 <param name="qhi" value="0.99"/> | |
127 <param name="qlo" value=""/> | |
128 <param name="tableout" value="create"/> | |
129 </test> | |
130 <test expect_num_outputs="3"> | |
131 <output name="bedouthi" value="bedouthi_sample" compare="diff" lines_diff="0"/> | |
132 <output name="bedoutlo" value="bedoutlo_sample" compare="diff" lines_diff="0"/> | |
133 <output name="tableoutfile" value="table3_sample" compare="diff" lines_diff="0"/> | |
134 <param name="outbeds" value="outlohi"/> | |
135 <param name="bigwig" value="bigwig_sample"/> | |
136 <param name="minwin" value="1"/> | |
137 <param name="qhi" value="0.9"/> | |
138 <param name="qlo" value="0.1"/> | |
139 <param name="tableout" value="create"/> | |
140 </test> | |
141 <test expect_num_outputs="4"> | |
142 <output name="bedouthilo" value="bedouthilo2_sample" compare="diff" lines_diff="0"/> | |
143 <output name="bedoutlo" value="bedoutlo2_sample" compare="diff" lines_diff="0"/> | |
144 <output name="bedouthi" value="bedouthi2_sample" compare="diff" lines_diff="0"/> | |
145 <output name="tableoutfile" value="table2_sample" compare="diff" lines_diff="0"/> | |
146 <param name="outbeds" value="outall"/> | |
147 <param name="bigwig" value="bigwig_sample,1.bigwig"/> | |
148 <param name="minwin" value="1"/> | |
149 <param name="qhi" value="0.9"/> | |
150 <param name="qlo" value="0.1"/> | |
151 <param name="tableout" value="create"/> | |
199 </test> | 152 </test> |
200 </tests> | 153 </tests> |
201 <help><![CDATA[ | 154 <help><![CDATA[ |
202 **What it Does** | 155 |
203 | 156 **Purpose** |
204 Takes one or more bigwigs mapped to the same reference and finds all the minimum window sized or greater contiguous regions above or below an upper and lower quantile cutoff. | 157 |
205 A window size of 10 works well, and quantiles set at 0.01 and 0.99 will generally work well. | 158 *Combine bigwig outlier regions into bed files* |
159 | |
160 Bigwigs allow quantative tracks to be viewed in an interactive genome browser like JBrowse2. | |
161 Peaks are easy to see. Unusually low regions can be harder to spot, even if they are relatively large, unless the view is zoomed right in. | |
162 Automated methods for combining evidence from multiple bigwigs can be useful for constructing browseable *issues* or other kinds of summary bed format tracks. | |
163 For example, combining coverage outlier regions, with the frequency of specific dicnucleotide short tandem repeats, | |
164 for evaluating technical sequencing technology effects in the evaluation of a genome assembly described at https://github.com/arangrhie/T2T-Polish | |
165 | |
166 **What does it produce?** | |
167 | |
168 Bed format results are output, containing each continuous segment of at least *minwin* base pairs above a cut point, or below another cut point. | |
169 These can be viewed as features on the reference genome using a genome browser tool like JBrowse2. | |
170 Three kinds of bed files can be created depending on the values included. | |
171 Both high and low regions in one bed output is the default. This can be displayed in JBrowse2 with colour indicating the high or low status, | |
172 one less track and a little easier to understand. High and low features can be output as separate bed files. | |
173 | |
174 **How is it controlled?** | |
175 | |
176 The cut points are calculated using a user supplied quantile, from each chromosome's bigwig value distribution. | |
177 The defaults are 0.99 and 0.01 and the default *minwin* is 10. | |
178 The probability of 10 values at or below the 1st percentile purely by chance is about 0.01**10, so false positives should be | |
179 rare, even in a 3GB genome. | |
180 This data driven and non-parametric method is preferred for the asymmetrical distributions found in typical bigwigs, such as depth of coverage | |
181 for genome sequencing reads. Coverage values are truncated at zero, and regions with very high values often form a long sparse right tail. | |
182 | |
183 **How do I choose the input data?** | |
184 | |
185 One or more bigwigs and can be selected as inputs. | |
186 Multiple bigwigs will be combined in bed files, so must share the reference genome to display | |
187 using JBrowse2. | |
188 | |
189 .. class:: warningmark | |
190 | |
191 **Lower quantile may not behave as expected in bigwigs with large fractions of zero values** | |
192 | |
193 The lower cut point may be problematic for integer values like coverage if many values are zero. For example, if 5% of bases have zero coverage, the 1st percentile is also zero, | |
194 but that cut point will include the entire 5% *at or below 0* | |
195 | |
206 | 196 |
207 ]]></help> | 197 ]]></help> |
208 <citations> | 198 <citations> |
209 <citation type="doi">10.1093/bioinformatics/btae350</citation> | 199 <citation type="doi">10.1093/bioinformatics/btae350</citation> |
210 </citations> | 200 </citations> |