Mercurial > repos > fubar > bigwigoutlierbed
comparison bigwig_outlier_bed.xml @ 0:2fbbc1be6655 draft
planemo upload for repository https://github.com/jackh726/bigtools commit ce6b9f638ebcebcad5a5b10219f252962f30e5cc-dirty
author | fubar |
---|---|
date | Mon, 01 Jul 2024 00:53:01 +0000 |
parents | |
children | 5c328fbb9418 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:2fbbc1be6655 |
---|---|
1 <tool name="bigwigoutlierbed" id="bigwigoutlierbed" version="0.01" profile="22.05"> | |
2 <!--Source in git at: https://github.com/fubar2/galaxy_tf_overlay--> | |
3 <!--Created by toolfactory@galaxy.org at 30/06/2024 19:44:14 using the Galaxy Tool Factory.--> | |
4 <description>Writes high and low bigwig regions as features in a bed file</description> | |
5 <edam_topics> | |
6 <edam_topic>topic_0157</edam_topic> | |
7 <edam_topic>topic_0092</edam_topic> | |
8 </edam_topics> | |
9 <edam_operations> | |
10 <edam_operation>operation_0337</edam_operation> | |
11 </edam_operations> | |
12 <requirements> | |
13 <requirement version="3.12.4" type="package">python</requirement> | |
14 <requirement version="2.0.0" type="package">numpy</requirement> | |
15 <requirement version="0.1.4" type="package">pybigtools</requirement> | |
16 </requirements> | |
17 <version_command><![CDATA[pybigtools --version]]></version_command> | |
18 <command><![CDATA[python | |
19 '$runme' | |
20 --bigwig | |
21 '$bigwig' | |
22 --bedouthilo | |
23 '$bedouthilo' | |
24 --minwin | |
25 '$minwin' | |
26 --qhi | |
27 '$qhi' | |
28 --qlo | |
29 '$qlo' | |
30 #if $tableout == "set" | |
31 --tableout | |
32 #end if | |
33 --bigwiglabels | |
34 '$bigwiglabels']]></command> | |
35 <configfiles> | |
36 <configfile name="runme"><![CDATA[#raw | |
37 """ | |
38 Bigwigs are great, but hard to reliably "see" small low coverage or small very high coverage regions. | |
39 Colouring in JB2 tracks will need a new plugin, so this code will find bigwig regions above and below a chosen percentile point. | |
40 0.99 and 0.01 work well in testing with a minimum span of 10 bp. | |
41 Multiple bigwigs **with the same reference** can be combined - bed segments will be named appropriately | |
42 Combining multiple references works but is silly because display will rely on one reference so features mapped to other references will not appear. | |
43 | |
44 Tricksy numpy method from http://gregoryzynda.com/python/numpy/contiguous/interval/2019/11/29/contiguous-regions.html | |
45 takes about 95 seconds for a 17MB test wiggle | |
46 JBrowse2 bed normally displays ignore the score, so could provide separate low/high bed file outputs as an option. | |
47 Update june 30 2024: wrote a 'no-build' plugin for beds to display red/blue if >0/<0 so those are used for scores | |
48 Bed interval naming must be short for JB2 but needs input bigwig name and (lo or hi). | |
49 """ | |
50 | |
51 import argparse | |
52 import numpy as np | |
53 import pybigtools | |
54 import sys | |
55 from pathlib import Path | |
56 | |
57 | |
58 class findOut(): | |
59 def __init__(self, args): | |
60 self.bwnames=args.bigwig | |
61 self.bwlabels=args.bigwiglabels | |
62 self.bedwin=args.minwin | |
63 self.qlo=args.qlo | |
64 self.qhi=args.qhi | |
65 self.bedouthilo=args.bedouthilo | |
66 self.bedouthi=args.bedouthi | |
67 self.bedoutlo=args.bedoutlo | |
68 self.tableout = args.tableout | |
69 self.bedwin = args.minwin | |
70 self.qhi = args.qhi | |
71 self.qlo = args.qlo | |
72 self.makeBed() | |
73 | |
74 def processVals(self, bw, isTop): | |
75 # http://gregoryzynda.com/python/numpy/contiguous/interval/2019/11/29/contiguous-regions.html | |
76 if isTop: | |
77 bwex = np.r_[False, bw >= self.bwtop, False] # extend with 0s | |
78 else: | |
79 bwex = np.r_[False, bw <= self.bwbot, False] | |
80 bwexd = np.diff(bwex) | |
81 bwexdnz = bwexd.nonzero()[0] | |
82 bwregions = np.reshape(bwexdnz, (-1,2)) | |
83 return bwregions | |
84 | |
85 def writeBed(self, bed, bedfname): | |
86 """ | |
87 potentially multiple | |
88 """ | |
89 bed.sort() | |
90 beds = ['%s\t%d\t%d\t%s\t%d' % x for x in bed] | |
91 with open(bedfname, "w") as bedf: | |
92 bedf.write('\n'.join(beds)) | |
93 bedf.write('\n') | |
94 print('Wrote %d bed regions to %s' % (len(bed), bedfname)) | |
95 | |
96 def makeBed(self): | |
97 bedhi = [] | |
98 bedlo = [] | |
99 bwlabels = self.bwlabels | |
100 bwnames = self.bwnames | |
101 print('bwnames=', bwnames, "bwlabs=", bwlabels) | |
102 for i, bwname in enumerate(bwnames): | |
103 bwlabel = bwlabels[i].replace(" ",'') | |
104 p = Path('in.bw') | |
105 p.symlink_to( bwname ) # required by pybigtools (!) | |
106 bwf = pybigtools.open('in.bw') | |
107 chrlist = bwf.chroms() | |
108 chrs = list(chrlist.keys()) | |
109 chrs.sort() | |
110 restab = ["contig\tn\tmean\tstd\tmin\tmax\tqtop\tqbot"] | |
111 for chr in chrs: | |
112 bw = bwf.values(chr) | |
113 bw = bw[~np.isnan(bw)] # some have NaN if parts of a contig not covered | |
114 if self.qhi is not None: | |
115 self.bwtop = np.quantile(bw, self.qhi) | |
116 bwhi = self.processVals(bw, isTop=True) | |
117 for i, seg in enumerate(bwhi): | |
118 if seg[1] - seg[0] >= self.bedwin: | |
119 bedhi.append((chr, seg[0], seg[1], '%s_hi' % (bwlabel), 1)) | |
120 if self.qlo is not None: | |
121 self.bwbot = np.quantile(bw, self.qlo) | |
122 bwlo = self.processVals(bw, isTop=False) | |
123 for i, seg in enumerate(bwlo): | |
124 if seg[1] - seg[0] >= self.bedwin: | |
125 bedlo.append((chr, seg[0], seg[1], '%s_lo' % (bwlabel), -1)) | |
126 bwmean = np.mean(bw) | |
127 bwstd = np.std(bw) | |
128 bwmax = np.max(bw) | |
129 nrow = np.size(bw) | |
130 bwmin = np.min(bw) | |
131 restab.append('%s\t%d\t%f\t%f\t%f\t%f\t%f\t%f' % (chr,nrow,bwmean,bwstd,bwmin,bwmax,self.bwtop,self.bwbot)) | |
132 print('\n'.join(restab), '\n') | |
133 if self.tableout: | |
134 with open(self.tableout) as t: | |
135 t.write('\n'.join(restab)) | |
136 t.write('\n') | |
137 if self.bedoutlo: | |
138 if self.qlo: | |
139 self.writeBed(bedlo, self.bedoutlo) | |
140 if self.bedouthi: | |
141 if self.qhi: | |
142 self.writeBed(bedhi, self.bedouthi) | |
143 if self.bedouthilo: | |
144 allbed = bedlo + bedhi | |
145 self.writeBed(allbed, self.bedouthilo) | |
146 return restab | |
147 | |
148 | |
149 if __name__ == "__main__": | |
150 parser = argparse.ArgumentParser() | |
151 a = parser.add_argument | |
152 a('-m', '--minwin',default=10, type=int) | |
153 a('-l', '--qlo',default=None, type=float) | |
154 a('-i', '--qhi',default=None, type=float) | |
155 a('-w', '--bigwig', nargs='+') | |
156 a('-n', '--bigwiglabels', nargs='+') | |
157 a('-o', '--bedouthilo', default=None, help="optional high and low combined bed") | |
158 a('-u', '--bedouthi', default=None, help="optional high only bed") | |
159 a('-b', '--bedoutlo', default=None, help="optional low only bed") | |
160 a('-t', '--tableout', default=None) | |
161 args = parser.parse_args() | |
162 print('args=', args) | |
163 if not (args.bedouthilo or args.bedouthi or args.bedoutlo): | |
164 sys.stderr.write("bigwig_outlier_bed.py cannot usefully run - need a bed output choice - must be one of low only, high only or both combined") | |
165 sys.exit(2) | |
166 if not (args.qlo or args.qhi): | |
167 sys.stderr.write("bigwig_outlier_bed.py cannot usefully run - need one or both of quantile cutpoints qhi and qlo") | |
168 sys.exit(2) | |
169 restab = findOut(args) | |
170 if args.tableout: | |
171 with open(args.tableout, 'w') as tout: | |
172 tout.write('\n'.join(restab)) | |
173 tout.write('\n') | |
174 #end raw]]></configfile> | |
175 </configfiles> | |
176 <inputs> | |
177 <param name="bigwig" type="data" optional="false" label="Bigwig file(s) to process. " help="If more than one, MUST all use the same reference sequence to be displayable. Feature names will include the bigwig label." format="bigwig" multiple="true"/> | |
178 <param name="minwin" type="integer" value="10" label="Minimum continuous bases to count as a high or low bed feature" help="Actual run length will be found and used for continuous features as long or longer."/> | |
179 <param name="qhi" type="float" value="0.99" label="Quantile cutoff for a high region - 0.99 will cut off at or above the 99th percentile" help=""/> | |
180 <param name="qlo" type="float" value="0.01" label="Quantile cutoff for a low region - 0.01 will cut off at or below the 1st percentile." help=""/> | |
181 <param name="tableout" type="select" label="Write a table showing contig statistics for each bigwig" help="" display="radio"> | |
182 <option value="notset">Do not set this flag</option> | |
183 <option value="set">Set this flag</option> | |
184 </param> | |
185 <param name="bigwiglabels" type="text" value="outbed" label="Label to use in bed feature names to indicate source bigwig contents - such as coverage" help=""/> | |
186 </inputs> | |
187 <outputs> | |
188 <data name="bedouthilo" format="bed" label="Both high and low contiguous regions as long or longer than window length into one bed " hidden="false"/> | |
189 </outputs> | |
190 <tests> | |
191 <test> | |
192 <output name="bedouthilo" value="bedouthilo_sample" compare="diff" lines_diff="0"/> | |
193 <param name="bigwig" value="bigwig_sample"/> | |
194 <param name="minwin" value="10"/> | |
195 <param name="qhi" value="0.99"/> | |
196 <param name="qlo" value="0.01"/> | |
197 <param name="tableout" value="notset"/> | |
198 <param name="bigwiglabels" value="outbed"/> | |
199 </test> | |
200 </tests> | |
201 <help><![CDATA[ | |
202 **What it Does** | |
203 | |
204 Takes one or more bigwigs mapped to the same reference and finds all the minimum window sized or greater contiguous regions above or below an upper and lower quantile cutoff. | |
205 A window size of 10 works well, and quantiles set at 0.01 and 0.99 will generally work well. | |
206 | |
207 ]]></help> | |
208 <citations> | |
209 <citation type="doi">10.1093/bioinformatics/btae350</citation> | |
210 </citations> | |
211 </tool> | |
212 |