Mercurial > repos > fubar > bigwig_outlier_bed
diff bigwig_outlier_bed.xml @ 6:eb17eb8a3658 draft
planemo upload commit 1baff96e75def9248afdcf21edec9bdc7ed42b1f-dirty
author | fubar |
---|---|
date | Tue, 23 Jul 2024 23:12:23 +0000 |
parents | 68cb8e7e266b |
children | c8e22efcaeda |
line wrap: on
line diff
--- a/bigwig_outlier_bed.xml Mon Jul 01 05:02:05 2024 +0000 +++ b/bigwig_outlier_bed.xml Tue Jul 23 23:12:23 2024 +0000 @@ -1,208 +1,198 @@ -<tool name="bigwig_outlier_bed" id="bigwig_outlier_bed" version="0.04" profile="22.05"> - <!--Source in git at: https://github.com/fubar2/galaxy_tf_overlay--> - <!--Created by toolfactory@galaxy.org at 30/06/2024 19:44:14 using the Galaxy Tool Factory.--> - <description>Writes high and low bigwig regions as features in a bed file</description> +<tool name="Bigwig extremes to bed features" id="bigwig_outlier_bed" version="@TOOL_VERSION@" profile="22.05"> + <description>Writes high and low bigwig runs as features in a bed file</description> + <macros> + <token name="@TOOL_VERSION@">0.2.0</token> + <token name="@NUMPY_VERSION@">2.0.0</token> + <token name="@PYTHON_VERSION@">3.12.3</token> + </macros> <edam_topics> <edam_topic>topic_0157</edam_topic> <edam_topic>topic_0092</edam_topic> </edam_topics> <edam_operations> <edam_operation>operation_0337</edam_operation> - </edam_operations> + </edam_operations> + <xrefs> + <xref type="bio.tools">bigtools</xref> + </xrefs> <requirements> - <requirement type="package" version="3.12.3">python</requirement> - <requirement type="package" version="2.0.0">numpy</requirement> - <requirement type="package" version="0.1.4">pybigtools</requirement> + <requirement type="package" version="@PYTHON_VERSION@">python</requirement> + <requirement type="package" version="@NUMPY_VERSION@">numpy</requirement> + <requirement type="package" version="@TOOL_VERSION@">pybigtools</requirement> </requirements> + <required_files> + <include path="bigwig_outlier_bed.py"/> + </required_files> <version_command><![CDATA[python -c "import pybigtools; from importlib.metadata import version; print(version('pybigtools'))"]]></version_command> - <command><![CDATA[python -'$runme' ---bigwig -'$bigwig' ---bedouthilo -'$bedouthilo' ---minwin -'$minwin' ---qhi -'$qhi' ---qlo -'$qlo' -#if $tableout == "set" - --tableout -#end if + <command><![CDATA[python '${__tool_directory__}/bigwig_outlier_bed.py' +--bigwig +#for bw in $bigwig: + '$bw' +#end for --bigwiglabels -'$bigwiglabels']]></command> - <configfiles> - <configfile name="runme"><![CDATA[#raw -""" -Bigwigs are great, but hard to reliably "see" small low coverage or small very high coverage regions. -Colouring in JB2 tracks will need a new plugin, so this code will find bigwig regions above and below a chosen percentile point. -0.99 and 0.01 work well in testing with a minimum span of 10 bp. -Multiple bigwigs **with the same reference** can be combined - bed segments will be named appropriately -Combining multiple references works but is silly because display will rely on one reference so features mapped to other references will not appear. - -Tricksy numpy method from http://gregoryzynda.com/python/numpy/contiguous/interval/2019/11/29/contiguous-regions.html -takes about 95 seconds for a 17MB test wiggle -JBrowse2 bed normally displays ignore the score, so could provide separate low/high bed file outputs as an option. -Update june 30 2024: wrote a 'no-build' plugin for beds to display red/blue if >0/<0 so those are used for scores -Bed interval naming must be short for JB2 but needs input bigwig name and (lo or hi). -""" - -import argparse -import numpy as np -import pybigtools -import sys -from pathlib import Path - - -class findOut(): - def __init__(self, args): - self.bwnames=args.bigwig - self.bwlabels=args.bigwiglabels - self.bedwin=args.minwin - self.qlo=args.qlo - self.qhi=args.qhi - self.bedouthilo=args.bedouthilo - self.bedouthi=args.bedouthi - self.bedoutlo=args.bedoutlo - self.tableout = args.tableout - self.bedwin = args.minwin - self.qhi = args.qhi - self.qlo = args.qlo - self.makeBed() - - def processVals(self, bw, isTop): - # http://gregoryzynda.com/python/numpy/contiguous/interval/2019/11/29/contiguous-regions.html - if isTop: - bwex = np.r_[False, bw >= self.bwtop, False] # extend with 0s - else: - bwex = np.r_[False, bw <= self.bwbot, False] - bwexd = np.diff(bwex) - bwexdnz = bwexd.nonzero()[0] - bwregions = np.reshape(bwexdnz, (-1,2)) - return bwregions - - def writeBed(self, bed, bedfname): - """ - potentially multiple - """ - bed.sort() - beds = ['%s\t%d\t%d\t%s\t%d' % x for x in bed] - with open(bedfname, "w") as bedf: - bedf.write('\n'.join(beds)) - bedf.write('\n') - print('Wrote %d bed regions to %s' % (len(bed), bedfname)) - - def makeBed(self): - bedhi = [] - bedlo = [] - bwlabels = self.bwlabels - bwnames = self.bwnames - print('bwnames=', bwnames, "bwlabs=", bwlabels) - for i, bwname in enumerate(bwnames): - bwlabel = bwlabels[i].replace(" ",'') - p = Path('in.bw') - p.symlink_to( bwname ) # required by pybigtools (!) - bwf = pybigtools.open('in.bw') - chrlist = bwf.chroms() - chrs = list(chrlist.keys()) - chrs.sort() - restab = ["contig\tn\tmean\tstd\tmin\tmax\tqtop\tqbot"] - for chr in chrs: - bw = bwf.values(chr) - bw = bw[~np.isnan(bw)] # some have NaN if parts of a contig not covered - if self.qhi is not None: - self.bwtop = np.quantile(bw, self.qhi) - bwhi = self.processVals(bw, isTop=True) - for i, seg in enumerate(bwhi): - if seg[1] - seg[0] >= self.bedwin: - bedhi.append((chr, seg[0], seg[1], '%s_hi' % (bwlabel), 1)) - if self.qlo is not None: - self.bwbot = np.quantile(bw, self.qlo) - bwlo = self.processVals(bw, isTop=False) - for i, seg in enumerate(bwlo): - if seg[1] - seg[0] >= self.bedwin: - bedlo.append((chr, seg[0], seg[1], '%s_lo' % (bwlabel), -1)) - bwmean = np.mean(bw) - bwstd = np.std(bw) - bwmax = np.max(bw) - nrow = np.size(bw) - bwmin = np.min(bw) - restab.append('%s\t%d\t%f\t%f\t%f\t%f\t%f\t%f' % (chr,nrow,bwmean,bwstd,bwmin,bwmax,self.bwtop,self.bwbot)) - print('\n'.join(restab), '\n') - if self.tableout: - with open(self.tableout) as t: - t.write('\n'.join(restab)) - t.write('\n') - if self.bedoutlo: - if self.qlo: - self.writeBed(bedlo, self.bedoutlo) - if self.bedouthi: - if self.qhi: - self.writeBed(bedhi, self.bedouthi) - if self.bedouthilo: - allbed = bedlo + bedhi - self.writeBed(allbed, self.bedouthilo) - return restab - - -if __name__ == "__main__": - parser = argparse.ArgumentParser() - a = parser.add_argument - a('-m', '--minwin',default=10, type=int) - a('-l', '--qlo',default=None, type=float) - a('-i', '--qhi',default=None, type=float) - a('-w', '--bigwig', nargs='+') - a('-n', '--bigwiglabels', nargs='+') - a('-o', '--bedouthilo', default=None, help="optional high and low combined bed") - a('-u', '--bedouthi', default=None, help="optional high only bed") - a('-b', '--bedoutlo', default=None, help="optional low only bed") - a('-t', '--tableout', default=None) - args = parser.parse_args() - print('args=', args) - if not (args.bedouthilo or args.bedouthi or args.bedoutlo): - sys.stderr.write("bigwig_outlier_bed.py cannot usefully run - need a bed output choice - must be one of low only, high only or both combined") - sys.exit(2) - if not (args.qlo or args.qhi): - sys.stderr.write("bigwig_outlier_bed.py cannot usefully run - need one or both of quantile cutpoints qhi and qlo") - sys.exit(2) - restab = findOut(args) - if args.tableout: - with open(args.tableout, 'w') as tout: - tout.write('\n'.join(restab)) - tout.write('\n') -#end raw]]></configfile> - </configfiles> +#for bw in $bigwig: + '$bw.name' +#end for +--outbeds '$outbeds' +#if $outbeds in ['outhilo', 'outall']: + --bedouthilo '$bedouthilo' +#end if +#if $outbeds in ['outhi', 'outall', 'outlohi']: + --bedouthi '$bedouthi' +#end if +#if $outbeds in ['outlo', 'outall', 'outlohi']: + --bedoutlo '$bedoutlo' +#end if +--minwin '$minwin' +#if $qhi: +--qhi '$qhi' +#end if +#if $qlo: +--qlo '$qlo' +#end if +#if $tableout == "create" or $outbeds == "outtab": + --tableoutfile '$tableoutfile' +#end if +]]></command> <inputs> - <param name="bigwig" type="data" optional="false" label="Bigwig file(s) to process. " help="If more than one, MUST all use the same reference sequence to be displayable. Feature names will include the bigwig label." format="bigwig" multiple="true"/> - <param name="minwin" type="integer" value="10" label="Minimum continuous bases to count as a high or low bed feature" help="Actual run length will be found and used for continuous features as long or longer."/> - <param name="qhi" type="float" value="0.99" label="Quantile cutoff for a high region - 0.99 will cut off at or above the 99th percentile" help=""/> - <param name="qlo" type="float" value="0.01" label="Quantile cutoff for a low region - 0.01 will cut off at or below the 1st percentile." help=""/> - <param name="tableout" type="select" label="Write a table showing contig statistics for each bigwig" help="" display="radio"> - <option value="notset">Do not set this flag</option> - <option value="set">Set this flag</option> + <param name="bigwig" type="data" optional="false" label="Choose one or more bigwig file(s) to return outlier regions as a bed file" + help="If more than one, MUST all use the same reference sequence to be displayable. Feature names will include the bigwig label." format="bigwig" multiple="true"/> + <param name="minwin" type="integer" value="10" label="Minimum continuous bases to count as a high or low bed feature" + help="Continuous features as long or longer than this window size will appear as bed features"/> + <param name="qhi" type="float" value="0.99" label="Quantile cutoff for a high region - 0.99 will cut off at or above the 99th percentile" help="Required" optional="false"/> + <param name="qlo" type="float" value="0.01" label="Quantile cutoff for a low region - 0.01 will cut off at or below the 1st percentile." help="Optional" optional="true"/> + <param name="outbeds" type="select" label="Select the required bed file outputs" help="Any combination of the 3 different kinds of bed file output can be made"> + <option value="outhilo" selected="true">Make 1 bed output with both low and high regions</option> + <option value="outhi">Make 1 bed output with high regions only</option> + <option value="outlo">Make 1 bed output with low regions only</option> + <option value="outall">Make 3 bed outputs with low and high together in one, high in one and low in the other</option> + <option value="outlohi">Make 2 bed outputs with high in one and low in the other</option> + <option value="outtab">NO bed outputs. Report bigwig value distribution only</option> </param> - <param name="bigwiglabels" type="text" value="outbed" label="Label to use in bed feature names to indicate source bigwig contents - such as coverage" help=""/> + <param name="tableout" type="select" label="Write a table showing contig statistics for each bigwig input" help=""> + <option value="donotmake">Do not create this report</option> + <option value="create" selected="true">Create this report</option> + </param> </inputs> <outputs> - <data name="bedouthilo" format="bed" label="Both high and low contiguous regions as long or longer than window length into one bed " hidden="false"/> + <data name="bedouthilo" format="bed" label="High_and_low_bed" hidden="false"> + <filter>outbeds in ["outall", "outhilo"]</filter> + </data> + <data name="bedouthi" format="bed" label="High bed" hidden="false"> + <filter>outbeds in ["outall", "outlohi", "outhi"]</filter> + </data> + <data name="bedoutlo" format="bed" label="Low bed" hidden="false"> + <filter>outbeds in ["outall", "outlohi", "outlo"]</filter> + </data> + <data name="tableoutfile" format="tabular" label="Contig statistics" hidden="false"> + <filter>tableout == "create"</filter> + </data> </outputs> <tests> - <test> + <test expect_num_outputs="1"> <output name="bedouthilo" value="bedouthilo_sample" compare="diff" lines_diff="0"/> + <param name="outbeds" value="outhilo"/> <param name="bigwig" value="bigwig_sample"/> <param name="minwin" value="10"/> <param name="qhi" value="0.99"/> <param name="qlo" value="0.01"/> - <param name="tableout" value="notset"/> - <param name="bigwiglabels" value="outbed"/> + <param name="tableout" value="donotmake"/> + </test> + <test expect_num_outputs="1"> + <output name="tableoutfile" value="table_only_sample" compare="diff" lines_diff="0"/> + <param name="outbeds" value="outtab"/> + <param name="bigwig" value="bigwig_sample,1.bigwig"/> + <param name="minwin" value="10"/> + <param name="qhi" value="0.99"/> + <param name="qlo" value="0.01"/> + <param name="tableout" value="create"/> + </test> + <test expect_num_outputs="2"> + <output name="bedouthilo" value="bedouthilo_sample" compare="diff" lines_diff="0"/> + <output name="tableoutfile" value="table_sample" compare="diff" lines_diff="0"/> + <param name="outbeds" value="outhilo"/> + <param name="bigwig" value="bigwig_sample"/> + <param name="minwin" value="10"/> + <param name="qhi" value="0.99"/> + <param name="qlo" value="0.01"/> + <param name="tableout" value="create"/> + </test> + <test expect_num_outputs="2"> + <output name="bedouthi" value="bedouthi_qlo_notset_sample" compare="diff" lines_diff="0"/> + <output name="tableoutfile" value="table_qlo_notset_sample" compare="diff" lines_diff="0"/> + <param name="outbeds" value="outhi"/> + <param name="bigwig" value="bigwig_sample"/> + <param name="minwin" value="10"/> + <param name="qhi" value="0.99"/> + <param name="qlo" value=""/> + <param name="tableout" value="create"/> + </test> + <test expect_num_outputs="3"> + <output name="bedouthi" value="bedouthi_sample" compare="diff" lines_diff="0"/> + <output name="bedoutlo" value="bedoutlo_sample" compare="diff" lines_diff="0"/> + <output name="tableoutfile" value="table3_sample" compare="diff" lines_diff="0"/> + <param name="outbeds" value="outlohi"/> + <param name="bigwig" value="bigwig_sample"/> + <param name="minwin" value="1"/> + <param name="qhi" value="0.9"/> + <param name="qlo" value="0.1"/> + <param name="tableout" value="create"/> + </test> + <test expect_num_outputs="4"> + <output name="bedouthilo" value="bedouthilo2_sample" compare="diff" lines_diff="0"/> + <output name="bedoutlo" value="bedoutlo2_sample" compare="diff" lines_diff="0"/> + <output name="bedouthi" value="bedouthi2_sample" compare="diff" lines_diff="0"/> + <output name="tableoutfile" value="table2_sample" compare="diff" lines_diff="0"/> + <param name="outbeds" value="outall"/> + <param name="bigwig" value="bigwig_sample,1.bigwig"/> + <param name="minwin" value="1"/> + <param name="qhi" value="0.9"/> + <param name="qlo" value="0.1"/> + <param name="tableout" value="create"/> </test> </tests> <help><![CDATA[ - **What it Does** + + **Purpose** + + *Combine bigwig outlier regions into bed files* - Takes one or more bigwigs mapped to the same reference and finds all the minimum window sized or greater contiguous regions above or below an upper and lower quantile cutoff. - A window size of 10 works well, and quantiles set at 0.01 and 0.99 will generally work well. + Bigwigs allow quantative tracks to be viewed in an interactive genome browser like JBrowse2. + Peaks are easy to see. Unusually low regions can be harder to spot, even if they are relatively large, unless the view is zoomed right in. + Automated methods for combining evidence from multiple bigwigs can be useful for constructing browseable *issues* or other kinds of summary bed format tracks. + For example, combining coverage outlier regions, with the frequency of specific dicnucleotide short tandem repeats, + for evaluating technical sequencing technology effects in the evaluation of a genome assembly described at https://github.com/arangrhie/T2T-Polish + + **What does it produce?** + + Bed format results are output, containing each continuous segment of at least *minwin* base pairs above a cut point, or below another cut point. + These can be viewed as features on the reference genome using a genome browser tool like JBrowse2. + Three kinds of bed files can be created depending on the values included. + Both high and low regions in one bed output is the default. This can be displayed in JBrowse2 with colour indicating the high or low status, + one less track and a little easier to understand. High and low features can be output as separate bed files. + + **How is it controlled?** + + The cut points are calculated using a user supplied quantile, from each chromosome's bigwig value distribution. + The defaults are 0.99 and 0.01 and the default *minwin* is 10. + The probability of 10 values at or below the 1st percentile purely by chance is about 0.01**10, so false positives should be + rare, even in a 3GB genome. + This data driven and non-parametric method is preferred for the asymmetrical distributions found in typical bigwigs, such as depth of coverage + for genome sequencing reads. Coverage values are truncated at zero, and regions with very high values often form a long sparse right tail. + + **How do I choose the input data?** + + One or more bigwigs and can be selected as inputs. + Multiple bigwigs will be combined in bed files, so must share the reference genome to display + using JBrowse2. + + .. class:: warningmark + + **Lower quantile may not behave as expected in bigwigs with large fractions of zero values** + + The lower cut point may be problematic for integer values like coverage if many values are zero. For example, if 5% of bases have zero coverage, the 1st percentile is also zero, + but that cut point will include the entire 5% *at or below 0* + ]]></help> <citations>