Repository 'microsatbed'
hg clone https://toolshed.g2.bx.psu.edu/repos/iuc/microsatbed

Changeset 0:2b970db61912 (2024-07-21)
Next changeset 1:dddd7ef63469 (2024-07-25)
Commit message:
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/microsatbed commit 275acb787c01484c6e435c8864090d377c3fde75
added:
README.md
all_fasta.loc.sample
find_str.py
microsatbed.xml
test-data/all_fasta.loc
test-data/bed_sample
test-data/builtinnativetsv_sample
test-data/dibed_sample
test-data/dibed_wig_sample
test-data/humsamp.fa
test-data/mouse.fa
test-data/nativegff_sample
tool-data/all_fasta.loc.sample
tool_data_table_conf.xml.sample
tool_data_table_conf.xml.test
b
diff -r 000000000000 -r 2b970db61912 README.md
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/README.md Sun Jul 21 07:19:00 2024 +0000
b
@@ -0,0 +1,53 @@
+## microsatellites to bed features
+
+ **Convert short repetitive sequences to bed features**
+
+ Microsatellites are usually defined as repeated short DNA patterns in an unbroken sequence.
+ A microsatellite pattern or *motif* can be any combination nucleotides, typically from 1 to 6nt in length.

+ This tool allows microsatellite and related features to be selected from a fasta sequence input file, and output into a single bed track, suitable for viewing in a genome browser such as JBrowse2.
+
+ All motifs of selected lengths can be reported as individual features in the output bed file, or specific motifs can be provided and all
+ others will be ignored. In all cases, a minimum required number of repeats can be specified. For example, requiring 2 or more repeats of the trimer *ACG* will report 
+ every sequence of *ACGACG* or *ACGACGACG* or *ACGACGACGACG* and so on, as individual bed features.  Similarly, requiring 3 repeats of any trimer will 
+ report every distinct 3 nucleotide pattern, including *ACGACGACG* as well as every other unique 3 nucleotide pattern with 3 sequential repeats or more such, as "CTCCTCCTC*.
+
+ For other output formats, the pytrf native command line *findstr* can be used to produce a gff, csv or tsv output containing all exact short tandem repeats, as 
+ described at the end of https://pytrf.readthedocs.io/en/latest
+
+ A fasta file must be supplied for processing. A built in genome can be selected, or a fasta file of any kind can be selected from the current history. Note that all 
+ symbols are treated as valid nucleotides by pytrf, so extraneous characters such as *-* or *N* in the input fasta may appear as unexpected bed features. Lower case fasta symbols will be converted
+ to uppercase, to prevent them being reported as distinct motifs.
+
+
+ **Filter motifs by length**

+ The default tool form setting is to select all dimer motif patterns. 

+ Additional motif lengths from 1 to 6nt can be selected in the multiple-select drop-down list. All features will be returned in a single bed file. For each selected motif length, 
+ the minimum number of repeats required for reporting can be adjusted. **Tandem repeats** are defined as at least 2 of any pattern. This tool allows singleton motifs to be reported,
+ so is not restricted to short tandem repeats (STR)
+
+ **Filter motifs by pattern**
+
+ This option allows a motif pattern to be specified as a text string such as *CG* or *ATC*. Multiple motifs can be specified as a comma separated string such as *CG,ATC*.
+ All features will be returned as a single bed file.
+
+ The minimum number of repeats for all motifs can be set to match specific requirements.
+
+ For example, technical sequencing read bias may be influenced by the density of specific dimers, whether they are repeated or not
+ such as in https://github.com/arangrhie/T2T-Polish/tree/master/pattern

+ **Run pytrf findstr to create a csv, tsv or gff format output with all perfect STR**
+
+This selection runs the pytrf *findstr* option to create gff/csv/tsv outputs as described at the end of https://pytrf.readthedocs.io/en/latest/. 
+
+Quoted here:
+
+   *A Tandem repeat (TR) in genomic sequence is a set of adjacent short DNA sequence repeated consecutively. The core sequence or repeat unit is generally called motif. 
+   According to the motif length, tandem repeats can be classified as microsatellites and minisatellites. Microsatellites are also known as simple sequence repeats (SSRs) 
+   or short tandem repeats (STRs) with motif length of 1-6 bp. Minisatellites are also sometimes referred to as variable number of tandem repeats (VNTRs) has longer motif length than microsatellites.
+   Pytrf is a lightweight Python C extension for identification of tandem repeats. The pytrf enables to fastly identify both exact or perfect SSRs.
+   It also can find generic tandem repeats with any size of motif, such as with maximum motif length of 100 bp. Additionally, it has capability of finding approximate or imperfect tandem repeats*
+

\ No newline at end of file
b
diff -r 000000000000 -r 2b970db61912 all_fasta.loc.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/all_fasta.loc.sample Sun Jul 21 07:19:00 2024 +0000
b
@@ -0,0 +1,18 @@
+#This file lists the locations and dbkeys of all the fasta files
+#under the "genome" directory (a directory that contains a directory
+#for each build). The script extract_fasta.py will generate the file
+#all_fasta.loc. This file has the format (white space characters are
+#TAB characters):
+#
+#<unique_build_id> <dbkey> <display_name> <file_path>
+#
+#So, all_fasta.loc could look something like this:
+#
+#apiMel3 apiMel3 Honeybee (Apis mellifera): apiMel3 /path/to/genome/apiMel3/apiMel3.fa
+#hg19canon hg19 Human (Homo sapiens): hg19 Canonical /path/to/genome/hg19/hg19canon.fa
+#hg19full hg19 Human (Homo sapiens): hg19 Full /path/to/genome/hg19/hg19full.fa
+#
+#Your all_fasta.loc file should contain an entry for each individual
+#fasta file. So there will be multiple fasta files for each build,
+#such as with hg19 above.
+#
b
diff -r 000000000000 -r 2b970db61912 find_str.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/find_str.py Sun Jul 21 07:19:00 2024 +0000
[
@@ -0,0 +1,127 @@
+import argparse
+import subprocess
+
+import pytrf  # 1.3.0
+from pyfastx import Fastx  # 0.5.2
+
+"""
+Allows all STR or those for a subset of motifs to be written to a bed file
+Designed to build some of the microsatellite tracks from https://github.com/arangrhie/T2T-Polish/tree/master/pattern for the VGP.
+"""
+
+
+def getDensity(name, bed, chrlen, winwidth):
+    """
+    pybigtools can write bigwigs and they are processed by other ucsc tools - but jb2 will not read them.
+    Swapped the conversion to use a bedgraph file processed by bedGraphToBigWig
+    """
+    nwin = int(chrlen / winwidth)
+    d = [0.0 for x in range(nwin + 1)]
+    for b in bed:
+        nt = b[5]
+        bin = int(b[1] / winwidth)
+        d[bin] += nt
+    bedg = [
+        (name, (x * winwidth), ((x + 1) * winwidth) - 1, float(d[x]))
+        for x in range(nwin + 1)
+        if (x + 1) * winwidth <= chrlen
+    ]
+    return bedg
+
+
+def write_ssrs(args):
+    """
+    The integers in the call change the minimum repeats for mono-, di-, tri-, tetra-, penta-, hexa-nucleotide repeats
+    ssrs = pytrf.STRFinder(name, seq, 10, 6, 4, 3, 3, 3)
+    NOTE: Dinucleotides GA and AG are reported separately by https://github.com/marbl/seqrequester.
+    The reversed pair STRs are about as common in the documentation sample.
+    Sequence read bias might be influenced by GC density or some other specific motif.
+    """
+    bed = []
+    wig = []
+    chrlens = {}
+    specific = None
+    if args.specific:
+        specific = args.specific.upper().split(",")
+    fa = Fastx(args.fasta, uppercase=True)
+    for name, seq in fa:
+        chrlen = len(seq)
+        chrlens[name] = chrlen
+        cbed = []
+        for ssr in pytrf.STRFinder(
+            name,
+            seq,
+            args.monomin,
+            args.dimin,
+            args.trimin,
+            args.tetramin,
+            args.pentamin,
+            args.hexamin,
+        ):
+            row = (
+                ssr.chrom,
+                ssr.start,
+                ssr.end,
+                ssr.motif,
+                ssr.repeat,
+                ssr.length,
+            )
+            if args.specific and ssr.motif in specific:
+                cbed.append(row)
+            elif args.mono and len(ssr.motif) == 1:
+                cbed.append(row)
+            elif args.di and len(ssr.motif) == 2:
+                cbed.append(row)
+            elif args.tri and len(ssr.motif) == 3:
+                cbed.append(row)
+            elif args.tetra and len(ssr.motif) == 4:
+                cbed.append(row)
+            elif args.penta and len(ssr.motif) == 5:
+                cbed.append(row)
+            elif args.hexa and len(ssr.motif) == 6:
+                cbed.append(row)
+        if args.bigwig:
+            w = getDensity(name, cbed, chrlen, args.winwidth)
+            wig += w
+        bed += cbed
+    if args.bigwig:
+        wig.sort()
+        bedg = ["%s %d %d %.2f" % x for x in wig]
+        with open("temp.bedg", "w") as bw:
+            bw.write("\n".join(bedg))
+        chroms = ["%s\t%s" % (x, chrlens[x]) for x in chrlens.keys()]
+        with open("temp.chromlen", "w") as cl:
+            cl.write("\n".join(chroms))
+        cmd = ["bedGraphToBigWig", "temp.bedg", "temp.chromlen", args.bed]
+        subprocess.run(cmd)
+    else:
+        bed.sort()
+        obed = ["%s\t%d\t%d\t%s_%d\t%d" % x for x in bed]
+        with open(args.bed, "w") as outbed:
+            outbed.write("\n".join(obed))
+            outbed.write("\n")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    a = parser.add_argument
+    a("--di", action="store_true")
+    a("--tri", action="store_true")
+    a("--tetra", action="store_true")
+    a("--penta", action="store_true")
+    a("--hexa", action="store_true")
+    a("--mono", action="store_true")
+    a("--dimin", default=2, type=int)
+    a("--trimin", default=2, type=int)
+    a("--tetramin", default=2, type=int)
+    a("--pentamin", default=2, type=int)
+    a("--hexamin", default=2, type=int)
+    a("--monomin", default=2, type=int)
+    a("-f", "--fasta", default="humsamp.fa")
+    a("-b", "--bed", default="humsamp.bed")
+    a("--bigwig", action="store_true")
+    a("--winwidth", default=128, type=int)
+    a("--specific", default=None)
+    a("--minreps", default=2, type=int)
+    args = parser.parse_args()
+    write_ssrs(args)
b
diff -r 000000000000 -r 2b970db61912 microsatbed.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/microsatbed.xml Sun Jul 21 07:19:00 2024 +0000
[
b'@@ -0,0 +1,297 @@\n+\n+<tool id="microsatbed" name="STR to bed" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="22.05">\n+    <description>Short Tandem Repeats to bed features from fasta</description>\n+    <macros>\n+        <token name="@TOOL_VERSION@">1.3.2</token>\n+        <token name="@VERSION_SUFFIX@">0</token>\n+        <token name="@PYFASTX_VERSION@">2.1.0</token>\n+        <token name="@PYTHON_VERSION@">3.12.3</token>\n+        <token name="@UCSC_VERSION@">455</token>\n+        <macro name="subsetmacro">\n+          <param name="subset" type="select" label="Select at least 1 specific motif length to report" help="Bed features will be output for every motif of the selected length(s) with the minimum required repeats or more" multiple="true">\n+              <option value="--di" selected="true">All dimers (AC,AG,AT,...)</option>\n+              <option value="--tri">All trimers (ACG,..)</option>\n+              <option value="--tetra">All tetramers (ACGT,..)</option>\n+              <option value="--penta">All pentamers (ACGTC,..)</option>\n+              <option value="--hexa">All hexamers (ACGTCG,..)</option>\n+              <option value="--mono">All monomers (A,C...). Warning! Can produce overwhelming numbers of bed features</option>\n+          </param>\n+        </macro>\n+    </macros>\n+    <requirements>\n+        <requirement version="@PYTHON_VERSION@" type="package">python</requirement>\n+        <requirement version="@PYFASTX_VERSION@" type="package">pyfastx</requirement>\n+        <requirement version="@TOOL_VERSION@" type="package">pytrf</requirement>\n+        <requirement version="@UCSC_VERSION@" type="package">ucsc-bedgraphtobigwig</requirement>\n+    </requirements>\n+    <required_files>\n+        <include path="find_str.py"/>\n+    </required_files>\n+    <version_command><![CDATA[python -c "import pytrf; from importlib.metadata import version; print(version(\'pytrf\'))"]]></version_command>\n+    <command><![CDATA[\n+  #if $mode_cond.mode == "NATIVE":\n+    #if $reference_genome.genome_type_select == "history":\n+      pytrf findstr -f \'$mode_cond.outformat\' -o \'$bed\' -r \'$monomin\' \'$dimin\' \'$trimin\' \'$tetramin\' \'$pentamin\' \'$hexamin\' \'${reference_genome.fasta}\'\n+    #else:\n+      pytrf findstr -f \'$mode_cond.outformat\' -o \'$bed\' -r \'$monomin\' \'$dimin\' \'$trimin\' \'$tetramin\' \'$pentamin\' \'$hexamin\' \'${reference_genome.fasta.fields.path}\'\n+    #end if\n+  #else:\n+    python \'${__tool_directory__}/find_str.py\'\n+    #if $reference_genome.genome_type_select == "history":\n+        --fasta \'${reference_genome.fasta}\'\n+    #else:\n+        --fasta \'${reference_genome.fasta.fields.path}\'\n+    #end if\n+    --bed \'$bed\'\n+    #if $mode_cond.mode == "SPECIFIC":\n+        --specific \'$mode_cond.specific\'\n+    #elif $mode_cond.mode == "SPECIFICBW":\n+        --bigwig\n+        --winwidth \'$mode_cond.winwidth\'\n+        --specific \'$mode_cond.specific\'\n+    #else:\n+      #for $flag in $mode_cond.subset:\n+        $flag\n+      #end for\n+    #end if\n+    --monomin \'$monomin\'\n+    --dimin \'$dimin\'\n+    --trimin \'$trimin\'\n+    --tetramin \'$tetramin\'\n+    --pentamin \'$pentamin\'\n+    --hexamin \'$hexamin\'\n+    #if $mode_cond.mode == "SPECIFICBW":\n+        --bigwig\n+        --winwidth \'$mode_cond.winwidth\'\n+    #end if\n+  #end if\n+]]></command>\n+    <inputs>\n+        <conditional name="reference_genome">\n+            <param name="genome_type_select" type="select" label="Select a source for fasta sequences to be searched for STRs" help="Options are to choose a built-in genome, or choose any history fasta file">\n+                <option value="indexed">Use a Galaxy server built-in reference genome fasta</option>\n+                <option value="history" selected="True">Use any fasta file from the current history</option>\n+            </param>\n+            <when value="indexed">\n+                <param name="fasta" type="select" label="Choose a built-in genome" help="If the genome you need is not on the list, upload it and select it as a current history fasta'..b'equential repeats or more such, as "CTCCTCCTC*.\n+\n+ For other output formats, the pytrf native command line *findstr* can be used to produce a gff, csv or tsv output containing all exact short tandem repeats, as \n+ described at the end of https://pytrf.readthedocs.io/en/latest\n+\n+ A fasta file must be supplied for processing. A built in genome can be selected, or a fasta file of any kind can be selected from the current history. Note that all \n+ symbols are treated as valid nucleotides by pytrf, so extraneous characters such as *-* or *N* in the input fasta may appear as unexpected bed features. Lower case fasta symbols will be converted\n+ to uppercase, to prevent them being reported as distinct motifs.\n+ \n+ Output can be bed format, or for two kinds of operation, a bigwig track showing bases covered by selected features over a configurable window size with a default of 128nt.\n+\n+ **Select motifs by length - for bed or windowed density bigwig**\n+ \n+ The default tool form setting is to select all dimer motif patterns. \n+ \n+ Any combination of motif lengths from 1 to 6nt can be selected in the multiple-select drop-down list. All features will be returned in a single bed file. For each selected motif length, \n+ the minimum number of repeats required for reporting can be adjusted. **Tandem repeats** are defined as at least 2 of any pattern. This tool allows singleton dimer motifs to be reported,\n+ so is not restricted to short tandem repeats (STR)\n+\n+ This mode of operation can produce a bed file with every STR as a separate feature.\n+ These can be very large and a bigwig containing the sum of STR bases over a selectable window size (default 128) may be more \n+ useful and much faster to load. \n+\n+ **Select motifs by pattern - for bed or windowed density bigwig**\n+\n+ This option allows a motif pattern to be specified as a text string such as *CG* or *ATC*. Multiple motifs can be specified as a comma separated string such as *CG,ATC*.\n+ All features will be returned as a single bed file.\n+\n+ The minimum number of repeats for all motifs can be set to match specific requirements.\n+\n+ For example, technical sequencing read bias may be influenced by the density of specific dimers, whether they are repeated or not\n+ such as in https://github.com/arangrhie/T2T-Polish/tree/master/pattern\n+\n+ This mode of operation can produce a bed file with every STR as a separate feature.\n+ These can be very large and a bigwig containing the sum of STR bases over a selectable window size (default 128) may be more \n+ useful and much faster to load. \n+\n+ **Select all perfect STR using pytrf findstr in csv, tsv or gff output format**\n+\n+This selection runs the pytrf *findstr* option to create gff/csv/tsv outputs as described at the end of https://pytrf.readthedocs.io/en/latest/. \n+\n+Quoted here:\n+\n+   *A Tandem repeat (TR) in genomic sequence is a set of adjacent short DNA sequence repeated consecutively. The core sequence or repeat unit is generally called motif. \n+   According to the motif length, tandem repeats can be classified as microsatellites and minisatellites. Microsatellites are also known as simple sequence repeats (SSRs) \n+   or short tandem repeats (STRs) with motif length of 1-6 bp. Minisatellites are also sometimes referred to as variable number of tandem repeats (VNTRs) has longer motif length than microsatellites.\n+   Pytrf is a lightweight Python C extension for identification of tandem repeats. The pytrf enables to fastly identify both exact or perfect SSRs.\n+   It also can find generic tandem repeats with any size of motif, such as with maximum motif length of 100 bp. Additionally, it has capability of finding approximate or imperfect tandem repeats*\n+ \n+  ]]></help>\n+    <citations>\n+        <citation type="bibtex">@misc{pytrf,\n+  title = {{pytrf} Short tandem repeat finder, Accessed on July 10 2024},\n+  howpublished = {\\url{https://github.com/lmdu/pytrf}},\n+  note = {Accessed on July 10 2024}\n+}</citation>\n+    </citations>\n+</tool>\n'
b
diff -r 000000000000 -r 2b970db61912 test-data/all_fasta.loc
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/all_fasta.loc Sun Jul 21 07:19:00 2024 +0000
b
@@ -0,0 +1,22 @@
+#Your all_fasta.loc file should contain an entry for each individual
+#fasta file. So there will be multiple fasta files for each build,
+#such as with hg19 above.
+#This file lists the locations and dbkeys of all the fasta files
+#under the "genome" directory (a directory that contains a directory
+#for each build). The script extract_fasta.py will generate the file
+#all_fasta.loc. This file has the format (white space characters are
+#TAB characters):
+#
+#<unique_build_id>      <dbkey> <display_name>  <file_path>
+#
+#So, all_fasta.loc could look something like this:
+#
+#apiMel3        apiMel3 Honeybee (Apis mellifera): apiMel3      /path/to/genome/apiMel3/apiMel3.fa
+#hg19canon      hg19    Human (Homo sapiens): hg19 Canonical    /path/to/genome/hg19/hg19canon.fa
+#hg19full       hg19    Human (Homo sapiens): hg19 Full /path/to/genome/hg19/hg19full.fa
+#
+#Your all_fasta.loc file should contain an entry for each individual
+#fasta file. So there will be multiple fasta files for each build,
+#such as with hg19 above.
+#
+hgtest hgtest hgtest ${__HERE__}/humsamp.fa
b
diff -r 000000000000 -r 2b970db61912 test-data/bed_sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/bed_sample Sun Jul 21 07:19:00 2024 +0000
b
b'@@ -0,0 +1,990 @@\n+hpat1\t507\t526\tTTTA_5\t20\n+hpat1\t1963\t1974\tCCCCAC_2\t12\n+hpat1\t2146\t2163\tAAAAAC_3\t18\n+hpat1\t3034\t3048\tAAAAC_3\t15\n+hpat1\t3236\t3247\tTTTTTT_2\t12\n+hpat1\t4117\t4128\tGGCAGG_2\t12\n+hpat1\t4311\t4325\tGTTTT_3\t15\n+hpat1\t5472\t5483\tTTTTTG_2\t12\n+hpat1\t5584\t5595\tAGCCCC_2\t12\n+hpat1\t7239\t7250\tGAAAAA_2\t12\n+hpat1\t8173\t8184\tAAAAAA_2\t12\n+hpat1\t8303\t8314\tGAGAGA_2\t12\n+hpat1\t8744\t8755\tGGAGGT_2\t12\n+hpat1\t8828\t8845\tAGAAAA_3\t18\n+hpat1\t9495\t9506\tGGAGGT_2\t12\n+hpat1\t9571\t9595\tAAAAC_5\t25\n+hpat1\t11623\t11634\tAGGGAA_2\t12\n+hpat1\t11651\t11662\tAACAAC_2\t12\n+hpat1\t12486\t12497\tCCTCTG_2\t12\n+hpat1\t13457\t13468\tACCCTA_2\t12\n+hpat1\t13471\t13488\tCCCTAA_3\t18\n+hpat1\t13489\t13500\tCCCCAA_2\t12\n+hpat1\t13501\t13518\tCCCCAG_3\t18\n+hpat1\t13523\t13546\tAACCCT_4\t24\n+hpat1\t13559\t13582\tCCCTAA_4\t24\n+hpat1\t13598\t13609\tACCCTA_2\t12\n+hpat1\t13612\t13635\tACCCTA_4\t24\n+hpat1\t13641\t13658\tAACCCC_3\t18\n+hpat1\t13659\t13670\tAACCCT_2\t12\n+hpat1\t13691\t13708\tACCCTC_3\t18\n+hpat1\t13721\t13744\tAACCCC_4\t24\n+hpat1\t13747\t13758\tCCCAAC_2\t12\n+hpat1\t13767\t13778\tCTCACC_2\t12\n+hpat1\t14307\t14318\tGGCGCA_2\t12\n+hpat1\t14336\t14347\tGGCGCA_2\t12\n+hpat1\t14384\t14395\tGGCGCA_2\t12\n+hpat1\t14472\t14483\tGGGGGG_2\t12\n+hpat1\t16148\t16159\tGAGTGG_2\t12\n+hpat1\t16547\t16558\tCTGAGG_2\t12\n+hpat1\t17030\t17041\tCAGGCA_2\t12\n+hpat1\t18072\t18083\tGTCTGG_2\t12\n+hpat1\t18120\t18131\tTCGTCC_2\t12\n+hpat1\t18795\t18806\tGCAGGC_2\t12\n+hpat1\t19207\t19218\tTGCTCC_2\t12\n+hpat1\t20148\t20159\tGGTGGT_2\t12\n+hpat1\t22835\t22846\tGGGAGG_2\t12\n+hpat1\t22939\t22950\tAAGCCA_2\t12\n+hpat1\t24658\t24669\tAAAAAA_2\t12\n+hpat1\t25690\t25701\tATTCAC_2\t12\n+hpat1\t26499\t26510\tCCCCTG_2\t12\n+hpat1\t29330\t29341\tCCTCCA_2\t12\n+hpat1\t30089\t30100\tCCAGGA_2\t12\n+hpat1\t31904\t31915\tGTGTGT_2\t12\n+hpat1\t32448\t32459\tGGAGGT_2\t12\n+hpat1\t32998\t33009\tTTTTCT_2\t12\n+hpat1\t35194\t35205\tGCCCAG_2\t12\n+hpat1\t35991\t36002\tCTCTCC_2\t12\n+hpat1\t36299\t36310\tTCTCTG_2\t12\n+hpat1\t36311\t36338\tTCTC_7\t28\n+hpat1\t36363\t36374\tTCTCTC_2\t12\n+hpat1\t37142\t37153\tAAAAAA_2\t12\n+hpat1\t37229\t37240\tACACAC_2\t12\n+hpat1\t38874\t38885\tAAAAAA_2\t12\n+hpat1\t39406\t39417\tCATGTG_2\t12\n+hpat1\t39489\t39500\tTTCTAG_2\t12\n+hpat1\t40468\t40482\tTCTGT_3\t15\n+hpat1\t40503\t40514\tGTACCA_2\t12\n+hpat1\t43740\t43764\tTCCAT_5\t25\n+hpat1\t43807\t43826\tCACTC_4\t20\n+hpat1\t43855\t43869\tTCCAC_3\t15\n+hpat1\t43965\t43984\tTCCAC_4\t20\n+hpat1\t44030\t44044\tTCCAC_3\t15\n+hpat1\t44255\t44269\tTCCAC_3\t15\n+hpat1\t44372\t44386\tCATTC_3\t15\n+hpat1\t44415\t44429\tTCCAC_3\t15\n+hpat1\t44790\t44804\tTCCAC_3\t15\n+hpat1\t44879\t44903\tTTCCA_5\t25\n+hpat1\t44920\t44934\tTCCAC_3\t15\n+hpat1\t45140\t45154\tTCCAT_3\t15\n+hpat1\t45260\t45274\tTCCAT_3\t15\n+hpat1\t45344\t45368\tTCCAT_5\t25\n+hpat1\t45369\t45383\tTCCAC_3\t15\n+hpat1\t45458\t45472\tCTCCA_3\t15\n+hpat1\t45526\t45545\tCATTC_4\t20\n+hpat1\t45554\t45573\tTCCAT_4\t20\n+hpat1\t45609\t45623\tTCCAC_3\t15\n+hpat1\t45743\t45757\tCTCCA_3\t15\n+hpat1\t45874\t45888\tTCCAT_3\t15\n+hpat1\t45889\t45903\tTCCAC_3\t15\n+hpat1\t45999\t46013\tTCCAC_3\t15\n+hpat1\t46017\t46031\tACTCC_3\t15\n+hpat1\t46139\t46153\tTCCAC_3\t15\n+hpat1\t46299\t46313\tTCCAC_3\t15\n+hpat1\t46363\t46377\tTTCCA_3\t15\n+hpat1\t46416\t46430\tCACTC_3\t15\n+hpat1\t46509\t46523\tTCCAC_3\t15\n+hpat1\t46838\t46852\tCTCCA_3\t15\n+hpat1\t46948\t46967\tTCCAT_4\t20\n+hpat1\t47042\t47056\tCTCCA_3\t15\n+hpat1\t47088\t47102\tTCCAT_3\t15\n+hpat1\t47107\t47131\tTTCCA_5\t25\n+hpat1\t47141\t47160\tATTCC_4\t20\n+hpat1\t47161\t47175\tACTCC_3\t15\n+hpat1\t47200\t47214\tCACTC_3\t15\n+hpat1\t47218\t47232\tTCCAC_3\t15\n+hpat1\t47303\t47322\tTCCAC_4\t20\n+hpat1\t47438\t47452\tTCCAC_3\t15\n+hpat1\t47488\t47502\tTCCAC_3\t15\n+hpat1\t47531\t47545\tACTCC_3\t15\n+hpat1\t47598\t47612\tTCCAC_3\t15\n+hpat1\t47777\t47791\tTCCAT_3\t15\n+hpat1\t47952\t47966\tTCCAC_3\t15\n+hpat1\t48075\t48089\tATTCC_3\t15\n+hpat1\t48518\t48532\tACTCC_3\t15\n+hpat1\t48546\t48560\tCATTC_3\t15\n+hpat1\t48640\t48654\tCATTC_3\t15\n+hpat1\t48673\t48692\tTCCAT_4\t20\n+hpat1\t48758\t48772\tTCCAT_3\t15\n+hpat1\t48806\t48825\tATTCC_4\t20\n+hpat1\t48826\t48840\tACTCC_3\t15\n+hpat1\t48841\t48855\tATTCC_3\t15\n+hpat1\t49027\t49041\tCTCCA_3\t15\n+hpat1\t49266\t49280\tACTCC_3\t15\n+hpat1\t49348\t49362\tTCCAC_3\t15\n+hpat1\t49404\t49418\tCCACT_3\t15\n+hpat1\t49498\t49512\tTCCAC_3\t15\n+hpat1\t49523\t49537\tTCCAC_3\t15\n+hpat1\t49658\t49672\tTCCAT_3\t15\n+hpat1\t49681\t49700\tATTCC_4\t20\n+hpat1\t49856\t49875\tATTCC_4\t20\n+hpat1\t50101\t50115\tTCCAT_3\t15\n+hpat1\t50215\t50229\tTCCAT_3'..b'T_5\t15\n+hpat1\t603188\t603199\tGAAGTG_2\t12\n+hpat1\t608985\t608996\tGCTGTG_2\t12\n+hpat1\t610192\t610203\tTATCTC_2\t12\n+hpat1\t612381\t612392\tTCACCA_2\t12\n+hpat1\t612521\t612535\tCTTTT_3\t15\n+hpat1\t613446\t613457\tCCATCC_2\t12\n+hpat1\t613964\t613975\tTTCTTG_2\t12\n+hpat1\t615611\t615622\tAGAAGG_2\t12\n+hpat1\t615632\t615643\tAAGAAG_2\t12\n+hpat1\t617678\t617697\tTTTTG_4\t20\n+hpat1\t620321\t620332\tGCCAGC_2\t12\n+hpat1\t620338\t620349\tGGGGTG_2\t12\n+hpat1\t621050\t621061\tAGGGCC_2\t12\n+hpat1\t621851\t621862\tTCTCTC_2\t12\n+hpat1\t622005\t622016\tGATTTT_2\t12\n+hpat1\t622557\t622571\tACC_5\t15\n+hpat1\t622586\t622606\tCAC_7\t21\n+hpat1\t623139\t623150\tACACAA_2\t12\n+hpat1\t623378\t623389\tTGGACA_2\t12\n+hpat1\t623706\t623723\tAAATAA_3\t18\n+hpat1\t624013\t624030\tAAA_6\t18\n+hpat1\t624648\t624662\tTTT_5\t15\n+hpat1\t624724\t624735\tACCTCC_2\t12\n+hpat1\t624982\t624993\tTTTTTT_2\t12\n+hpat1\t625129\t625146\tTTTTAT_3\t18\n+hpat1\t625595\t625606\tAGCCAG_2\t12\n+hpat1\t627030\t627041\tGAGGCT_2\t12\n+hpat1\t627187\t627198\tAATAAC_2\t12\n+hpat1\t629078\t629092\tTGA_5\t15\n+hpat1\t631394\t631405\tAACAAC_2\t12\n+hpat1\t631429\t631443\tAAA_5\t15\n+hpat1\t632732\t632743\tGTGGGG_2\t12\n+hpat1\t633722\t633733\tACTCTC_2\t12\n+hpat1\t634319\t634330\tCTTCCT_2\t12\n+hpat1\t634612\t634623\tTCCTTG_2\t12\n+hpat1\t635904\t635918\tAAA_5\t15\n+hpat1\t637886\t637900\tAAA_5\t15\n+hpat1\t639391\t639402\tCGCCGC_2\t12\n+hpat1\t639742\t639759\tGCGGGG_3\t18\n+hpat1\t640687\t640698\tCCTTTC_2\t12\n+hpat1\t642772\t642783\tAAAAAA_2\t12\n+hpat1\t646664\t646681\tAAA_6\t18\n+hpat1\t649141\t649152\tAGCTGG_2\t12\n+hpat1\t649225\t649236\tGTGTGT_2\t12\n+hpat1\t649745\t649759\tCCGGG_3\t15\n+hpat1\t649865\t649876\tGCGAGG_2\t12\n+hpat1\t650432\t650443\tCCGCCC_2\t12\n+hpat1\t650444\t650455\tCGCCGC_2\t12\n+hpat1\t651677\t651688\tCCCCTC_2\t12\n+hpat1\t651819\t651830\tTGCCTC_2\t12\n+hpat1\t652399\t652410\tGATAGT_2\t12\n+hpat1\t652602\t652613\tCTCCTT_2\t12\n+hpat1\t653054\t653065\tGGAAGG_2\t12\n+hpat1\t655086\t655103\tTACTGC_3\t18\n+hpat1\t655717\t655728\tAGCCTC_2\t12\n+hpat1\t656009\t656020\tCCTCCC_2\t12\n+hpat1\t657238\t657249\tAGGCCA_2\t12\n+hpat1\t657509\t657520\tTTGGAC_2\t12\n+hpat1\t657610\t657621\tTGGACC_2\t12\n+hpat1\t658606\t658633\tGTGT_7\t28\n+hpat1\t660059\t660070\tGTGTCA_2\t12\n+hpat1\t660332\t660343\tGCTGTA_2\t12\n+hpat1\t660404\t660415\tCTCTCT_2\t12\n+hpat1\t660566\t660577\tTCTCTC_2\t12\n+hpat1\t660592\t660619\tTCTC_7\t28\n+hpat1\t660621\t660648\tCACA_7\t28\n+hpat1\t660896\t660927\tGTGT_8\t32\n+hpat1\t660985\t660996\tTGGGGA_2\t12\n+hpat1\t661631\t661642\tCCCCCG_2\t12\n+hpat1\t661936\t661947\tGGCTCG_2\t12\n+hpat1\t661953\t661964\tGGCGGC_2\t12\n+hpat1\t662189\t662200\tCAGAGA_2\t12\n+hpat1\t662391\t662402\tACACAC_2\t12\n+hpat1\t662617\t662628\tCCTCTC_2\t12\n+hpat1\t663604\t663618\tAAA_5\t15\n+hpat1\t663771\t663791\tAAT_7\t21\n+hpat1\t664081\t664092\tTTTTTT_2\t12\n+hpat1\t667668\t667679\tAATGGG_2\t12\n+hpat1\t668025\t668036\tGTGTAA_2\t12\n+hpat1\t668718\t668729\tCCATTT_2\t12\n+hpat1\t669900\t669914\tAAA_5\t15\n+hpat1\t670208\t670219\tTTTTTA_2\t12\n+hpat1\t671191\t671202\tGAGGGG_2\t12\n+hpat1\t671869\t671886\tTACAAA_3\t18\n+hpat1\t671887\t671925\tTAA_13\t39\n+hpat1\t671964\t671975\tCTGAGG_2\t12\n+hpat1\t672093\t672104\tAACAAC_2\t12\n+hpat1\t672105\t672116\tAAAGAA_2\t12\n+hpat1\t672622\t672639\tAAA_6\t18\n+hpat1\t675613\t675624\tAAGTGT_2\t12\n+hpat1\t675731\t675742\tGGCACA_2\t12\n+hpat1\t676339\t676350\tATTTCT_2\t12\n+hpat1\t676352\t676363\tTTTATT_2\t12\n+hpat1\t677003\t677034\tGTGT_8\t32\n+hpat1\t677457\t677468\tAGAAAG_2\t12\n+hpat1\t678388\t678399\tTGAAAC_2\t12\n+hpat1\t678842\t678856\tTTCTT_3\t15\n+hpat1\t678858\t678869\tTTTTTC_2\t12\n+hpat1\t684578\t684589\tAATAAC_2\t12\n+hpat1\t684953\t684964\tAGAGCC_2\t12\n+hpat1\t685312\t685323\tGGAGGT_2\t12\n+hpat1\t686068\t686079\tCGGGGG_2\t12\n+hpat1\t686825\t686842\tTTT_6\t18\n+hpat1\t686899\t686910\tCAGCCT_2\t12\n+hpat1\t687586\t687605\tTCAA_5\t20\n+hpat1\t687651\t687662\tAGGGAG_2\t12\n+hpat1\t688067\t688078\tTCCTGG_2\t12\n+hpat1\t689018\t689037\tAAAT_5\t20\n+hpat1\t690132\t690143\tTTTCTT_2\t12\n+hpat1\t690303\t690314\tAAGAGG_2\t12\n+hpat1\t693707\t693718\tAGGGGC_2\t12\n+hpat1\t694426\t694437\tTTTTGT_2\t12\n+hpat1\t694506\t694517\tCAGCCT_2\t12\n+hpat1\t694598\t694609\tTTTTTT_2\t12\n+hpat1\t694903\t694914\tTATTTA_2\t12\n+hpat1\t695265\t695276\tCAAGGC_2\t12\n+hpat1\t696496\t696507\tCGCCCC_2\t12\n+hpat1\t696718\t696729\tCGCCCC_2\t12\n+hpat1\t697067\t697078\tTGGGGG_2\t12\n+hpat1\t697568\t697587\tGTGT_5\t20\n+hpat1\t698582\t698593\tTCTGAT_2\t12\n+hpat1\t698951\t698962\tAATTGA_2\t12\n+hpat1\t699157\t699171\tTTT_5\t15\n'
b
diff -r 000000000000 -r 2b970db61912 test-data/builtinnativetsv_sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/builtinnativetsv_sample Sun Jul 21 07:19:00 2024 +0000
b
b'@@ -0,0 +1,978 @@\n+hpat1\t507\t526\tTTTA\t4\t5\t20\n+hpat1\t1963\t1974\tCCCCAC\t6\t2\t12\n+hpat1\t2146\t2163\tAAAAAC\t6\t3\t18\n+hpat1\t3236\t3247\tTTTTTT\t6\t2\t12\n+hpat1\t4117\t4128\tGGCAGG\t6\t2\t12\n+hpat1\t5472\t5483\tTTTTTG\t6\t2\t12\n+hpat1\t5584\t5595\tAGCCCC\t6\t2\t12\n+hpat1\t7239\t7250\tGAAAAA\t6\t2\t12\n+hpat1\t8173\t8184\tAAAAAA\t6\t2\t12\n+hpat1\t8275\t8302\tA\t1\t28\t28\n+hpat1\t8303\t8314\tGAGAGA\t6\t2\t12\n+hpat1\t8744\t8755\tGGAGGT\t6\t2\t12\n+hpat1\t8828\t8845\tAGAAAA\t6\t3\t18\n+hpat1\t9495\t9506\tGGAGGT\t6\t2\t12\n+hpat1\t9571\t9595\tAAAAC\t5\t5\t25\n+hpat1\t11623\t11634\tAGGGAA\t6\t2\t12\n+hpat1\t11651\t11662\tAACAAC\t6\t2\t12\n+hpat1\t12486\t12497\tCCTCTG\t6\t2\t12\n+hpat1\t13457\t13468\tACCCTA\t6\t2\t12\n+hpat1\t13471\t13488\tCCCTAA\t6\t3\t18\n+hpat1\t13489\t13500\tCCCCAA\t6\t2\t12\n+hpat1\t13501\t13518\tCCCCAG\t6\t3\t18\n+hpat1\t13523\t13546\tAACCCT\t6\t4\t24\n+hpat1\t13559\t13582\tCCCTAA\t6\t4\t24\n+hpat1\t13598\t13609\tACCCTA\t6\t2\t12\n+hpat1\t13612\t13635\tACCCTA\t6\t4\t24\n+hpat1\t13641\t13658\tAACCCC\t6\t3\t18\n+hpat1\t13659\t13670\tAACCCT\t6\t2\t12\n+hpat1\t13691\t13708\tACCCTC\t6\t3\t18\n+hpat1\t13721\t13744\tAACCCC\t6\t4\t24\n+hpat1\t13747\t13758\tCCCAAC\t6\t2\t12\n+hpat1\t13767\t13778\tCTCACC\t6\t2\t12\n+hpat1\t14149\t14164\tGCCC\t4\t4\t16\n+hpat1\t14307\t14318\tGGCGCA\t6\t2\t12\n+hpat1\t14336\t14347\tGGCGCA\t6\t2\t12\n+hpat1\t14384\t14395\tGGCGCA\t6\t2\t12\n+hpat1\t14472\t14483\tGGGGGG\t6\t2\t12\n+hpat1\t16148\t16159\tGAGTGG\t6\t2\t12\n+hpat1\t16547\t16558\tCTGAGG\t6\t2\t12\n+hpat1\t17030\t17041\tCAGGCA\t6\t2\t12\n+hpat1\t18072\t18083\tGTCTGG\t6\t2\t12\n+hpat1\t18120\t18131\tTCGTCC\t6\t2\t12\n+hpat1\t18795\t18806\tGCAGGC\t6\t2\t12\n+hpat1\t19207\t19218\tTGCTCC\t6\t2\t12\n+hpat1\t20148\t20159\tGGTGGT\t6\t2\t12\n+hpat1\t22835\t22846\tGGGAGG\t6\t2\t12\n+hpat1\t22939\t22950\tAAGCCA\t6\t2\t12\n+hpat1\t24658\t24669\tAAAAAA\t6\t2\t12\n+hpat1\t25690\t25701\tATTCAC\t6\t2\t12\n+hpat1\t26499\t26510\tCCCCTG\t6\t2\t12\n+hpat1\t29330\t29341\tCCTCCA\t6\t2\t12\n+hpat1\t30089\t30100\tCCAGGA\t6\t2\t12\n+hpat1\t31904\t31915\tGTGTGT\t6\t2\t12\n+hpat1\t32448\t32459\tGGAGGT\t6\t2\t12\n+hpat1\t32998\t33009\tTTTTCT\t6\t2\t12\n+hpat1\t35194\t35205\tGCCCAG\t6\t2\t12\n+hpat1\t35991\t36002\tCTCTCC\t6\t2\t12\n+hpat1\t36299\t36310\tTCTCTG\t6\t2\t12\n+hpat1\t36311\t36338\tTC\t2\t14\t28\n+hpat1\t36363\t36374\tTCTCTC\t6\t2\t12\n+hpat1\t37142\t37153\tAAAAAA\t6\t2\t12\n+hpat1\t37229\t37240\tACACAC\t6\t2\t12\n+hpat1\t38874\t38885\tAAAAAA\t6\t2\t12\n+hpat1\t39406\t39417\tCATGTG\t6\t2\t12\n+hpat1\t39489\t39500\tTTCTAG\t6\t2\t12\n+hpat1\t39620\t39642\tT\t1\t23\t23\n+hpat1\t40503\t40514\tGTACCA\t6\t2\t12\n+hpat1\t43740\t43764\tTCCAT\t5\t5\t25\n+hpat1\t43807\t43826\tCACTC\t5\t4\t20\n+hpat1\t43965\t43984\tTCCAC\t5\t4\t20\n+hpat1\t44879\t44903\tTTCCA\t5\t5\t25\n+hpat1\t45344\t45368\tTCCAT\t5\t5\t25\n+hpat1\t45526\t45545\tCATTC\t5\t4\t20\n+hpat1\t45554\t45573\tTCCAT\t5\t4\t20\n+hpat1\t46948\t46967\tTCCAT\t5\t4\t20\n+hpat1\t47107\t47131\tTTCCA\t5\t5\t25\n+hpat1\t47141\t47160\tATTCC\t5\t4\t20\n+hpat1\t47303\t47322\tTCCAC\t5\t4\t20\n+hpat1\t48673\t48692\tTCCAT\t5\t4\t20\n+hpat1\t48806\t48825\tATTCC\t5\t4\t20\n+hpat1\t49681\t49700\tATTCC\t5\t4\t20\n+hpat1\t49856\t49875\tATTCC\t5\t4\t20\n+hpat1\t50315\t50334\tTCCAC\t5\t4\t20\n+hpat1\t50775\t50794\tTCCAT\t5\t4\t20\n+hpat1\t51153\t51172\tATTCC\t5\t4\t20\n+hpat1\t52508\t52519\tGCCTTT\t6\t2\t12\n+hpat1\t53676\t53687\tTTTCAG\t6\t2\t12\n+hpat1\t55005\t55016\tAAAAAC\t6\t2\t12\n+hpat1\t58746\t58757\tGCTTAT\t6\t2\t12\n+hpat1\t59055\t59066\tTAGAAA\t6\t2\t12\n+hpat1\t61658\t61669\tGGAGGT\t6\t2\t12\n+hpat1\t61733\t61747\tAAA\t3\t5\t15\n+hpat1\t62761\t62775\tAAA\t3\t5\t15\n+hpat1\t63859\t63874\tAAAG\t4\t4\t16\n+hpat1\t65522\t65533\tTTTTTT\t6\t2\t12\n+hpat1\t67692\t67703\tCTCAGA\t6\t2\t12\n+hpat1\t68526\t68537\tTGCCTC\t6\t2\t12\n+hpat1\t68608\t68622\tTTT\t3\t5\t15\n+hpat1\t68851\t68862\tCAGCCT\t6\t2\t12\n+hpat1\t69176\t69187\tCCTTAT\t6\t2\t12\n+hpat1\t72054\t72065\tCTCTCC\t6\t2\t12\n+hpat1\t73588\t73599\tCAAATA\t6\t2\t12\n+hpat1\t73891\t73924\tAC\t2\t17\t34\n+hpat1\t74873\t74884\tTCTCTT\t6\t2\t12\n+hpat1\t76181\t76192\tTTTGTT\t6\t2\t12\n+hpat1\t77091\t77106\tATAC\t4\t4\t16\n+hpat1\t77427\t77441\tTTT\t3\t5\t15\n+hpat1\t78001\t78015\tTTT\t3\t5\t15\n+hpat1\t80746\t80757\tTTTTCA\t6\t2\t12\n+hpat1\t80794\t80805\tCTCTCT\t6\t2\t12\n+hpat1\t81769\t81780\tCAAAAA\t6\t2\t12\n+hpat1\t82355\t82369\tTTT\t3\t5\t15\n+hpat1\t83934\t83945\tTAAATG\t6\t2\t12\n+hpat1\t84620\t84631\tTCAAAA\t6\t2\t12\n+hpat1\t86960\t86975\tTATA\t4\t4\t16\n+hpat1\t87245\t87256\tTTTCTG\t6\t2\t12\n+hpat1\t87483\t87494\tTTTTCC\t6\t2\t12\n+hpat1\t87622\t87641\tTC\t2\t10\t20\n+hpat1\t87642\t87653\tTGTGTG\t6\t2\t12\n+hpat1\t87657\t87684\tGT\t2\t14\t28\n+hpat1\t90387\t90401\tAAA\t3\t5\t15\n+hpat1\t91598\t91609\tCTAGAA\t6\t2\t12\n+hpat1\t92664\t92675\tAAGAAA\t6\t2\t12\n+hpat1\t92820\t92831\tTCAAT'..b'140\t616161\tA\t1\t22\t22\n+hpat1\t617678\t617697\tTTTTG\t5\t4\t20\n+hpat1\t620321\t620332\tGCCAGC\t6\t2\t12\n+hpat1\t620338\t620349\tGGGGTG\t6\t2\t12\n+hpat1\t621050\t621061\tAGGGCC\t6\t2\t12\n+hpat1\t621851\t621862\tTCTCTC\t6\t2\t12\n+hpat1\t622005\t622016\tGATTTT\t6\t2\t12\n+hpat1\t622557\t622571\tACC\t3\t5\t15\n+hpat1\t622586\t622606\tCAC\t3\t7\t21\n+hpat1\t623139\t623150\tACACAA\t6\t2\t12\n+hpat1\t623378\t623389\tTGGACA\t6\t2\t12\n+hpat1\t623706\t623723\tAAATAA\t6\t3\t18\n+hpat1\t624013\t624030\tAAA\t3\t6\t18\n+hpat1\t624648\t624662\tTTT\t3\t5\t15\n+hpat1\t624724\t624735\tACCTCC\t6\t2\t12\n+hpat1\t624982\t624993\tTTTTTT\t6\t2\t12\n+hpat1\t625129\t625146\tTTTTAT\t6\t3\t18\n+hpat1\t625595\t625606\tAGCCAG\t6\t2\t12\n+hpat1\t627030\t627041\tGAGGCT\t6\t2\t12\n+hpat1\t627187\t627198\tAATAAC\t6\t2\t12\n+hpat1\t629078\t629092\tTGA\t3\t5\t15\n+hpat1\t631394\t631405\tAACAAC\t6\t2\t12\n+hpat1\t631429\t631443\tAAA\t3\t5\t15\n+hpat1\t632704\t632731\tA\t1\t28\t28\n+hpat1\t632732\t632743\tGTGGGG\t6\t2\t12\n+hpat1\t633722\t633733\tACTCTC\t6\t2\t12\n+hpat1\t634319\t634330\tCTTCCT\t6\t2\t12\n+hpat1\t634612\t634623\tTCCTTG\t6\t2\t12\n+hpat1\t635904\t635918\tAAA\t3\t5\t15\n+hpat1\t637886\t637900\tAAA\t3\t5\t15\n+hpat1\t639391\t639402\tCGCCGC\t6\t2\t12\n+hpat1\t639742\t639759\tGCGGGG\t6\t3\t18\n+hpat1\t640687\t640698\tCCTTTC\t6\t2\t12\n+hpat1\t642772\t642783\tAAAAAA\t6\t2\t12\n+hpat1\t646664\t646681\tAAA\t3\t6\t18\n+hpat1\t649141\t649152\tAGCTGG\t6\t2\t12\n+hpat1\t649225\t649236\tGTGTGT\t6\t2\t12\n+hpat1\t649865\t649876\tGCGAGG\t6\t2\t12\n+hpat1\t650432\t650443\tCCGCCC\t6\t2\t12\n+hpat1\t650444\t650455\tCGCCGC\t6\t2\t12\n+hpat1\t651677\t651688\tCCCCTC\t6\t2\t12\n+hpat1\t651819\t651830\tTGCCTC\t6\t2\t12\n+hpat1\t652399\t652410\tGATAGT\t6\t2\t12\n+hpat1\t652602\t652613\tCTCCTT\t6\t2\t12\n+hpat1\t653054\t653065\tGGAAGG\t6\t2\t12\n+hpat1\t655086\t655103\tTACTGC\t6\t3\t18\n+hpat1\t655717\t655728\tAGCCTC\t6\t2\t12\n+hpat1\t656009\t656020\tCCTCCC\t6\t2\t12\n+hpat1\t657238\t657249\tAGGCCA\t6\t2\t12\n+hpat1\t657509\t657520\tTTGGAC\t6\t2\t12\n+hpat1\t657610\t657621\tTGGACC\t6\t2\t12\n+hpat1\t658606\t658633\tGT\t2\t14\t28\n+hpat1\t658934\t658949\tCCCT\t4\t4\t16\n+hpat1\t660059\t660070\tGTGTCA\t6\t2\t12\n+hpat1\t660332\t660343\tGCTGTA\t6\t2\t12\n+hpat1\t660404\t660415\tCTCTCT\t6\t2\t12\n+hpat1\t660566\t660577\tTCTCTC\t6\t2\t12\n+hpat1\t660592\t660621\tTC\t2\t15\t30\n+hpat1\t660622\t660647\tAC\t2\t13\t26\n+hpat1\t660896\t660927\tGT\t2\t16\t32\n+hpat1\t660985\t660996\tTGGGGA\t6\t2\t12\n+hpat1\t661631\t661642\tCCCCCG\t6\t2\t12\n+hpat1\t661936\t661947\tGGCTCG\t6\t2\t12\n+hpat1\t661953\t661964\tGGCGGC\t6\t2\t12\n+hpat1\t662189\t662200\tCAGAGA\t6\t2\t12\n+hpat1\t662391\t662402\tACACAC\t6\t2\t12\n+hpat1\t662617\t662628\tCCTCTC\t6\t2\t12\n+hpat1\t663604\t663618\tAAA\t3\t5\t15\n+hpat1\t663771\t663791\tAAT\t3\t7\t21\n+hpat1\t664081\t664092\tTTTTTT\t6\t2\t12\n+hpat1\t667668\t667679\tAATGGG\t6\t2\t12\n+hpat1\t668025\t668036\tGTGTAA\t6\t2\t12\n+hpat1\t668718\t668729\tCCATTT\t6\t2\t12\n+hpat1\t669900\t669914\tAAA\t3\t5\t15\n+hpat1\t670208\t670219\tTTTTTA\t6\t2\t12\n+hpat1\t671191\t671202\tGAGGGG\t6\t2\t12\n+hpat1\t671869\t671886\tTACAAA\t6\t3\t18\n+hpat1\t671887\t671925\tTAA\t3\t13\t39\n+hpat1\t671964\t671975\tCTGAGG\t6\t2\t12\n+hpat1\t672093\t672104\tAACAAC\t6\t2\t12\n+hpat1\t672105\t672116\tAAAGAA\t6\t2\t12\n+hpat1\t672622\t672639\tAAA\t3\t6\t18\n+hpat1\t675613\t675624\tAAGTGT\t6\t2\t12\n+hpat1\t675731\t675742\tGGCACA\t6\t2\t12\n+hpat1\t676339\t676350\tATTTCT\t6\t2\t12\n+hpat1\t676352\t676363\tTTTATT\t6\t2\t12\n+hpat1\t677003\t677036\tGT\t2\t17\t34\n+hpat1\t677457\t677468\tAGAAAG\t6\t2\t12\n+hpat1\t678388\t678399\tTGAAAC\t6\t2\t12\n+hpat1\t678858\t678869\tTTTTTC\t6\t2\t12\n+hpat1\t682346\t682361\tTTAT\t4\t4\t16\n+hpat1\t684578\t684589\tAATAAC\t6\t2\t12\n+hpat1\t684953\t684964\tAGAGCC\t6\t2\t12\n+hpat1\t685312\t685323\tGGAGGT\t6\t2\t12\n+hpat1\t685386\t685413\tA\t1\t28\t28\n+hpat1\t686068\t686079\tCGGGGG\t6\t2\t12\n+hpat1\t686825\t686842\tTTT\t3\t6\t18\n+hpat1\t686899\t686910\tCAGCCT\t6\t2\t12\n+hpat1\t687586\t687605\tTCAA\t4\t5\t20\n+hpat1\t687651\t687662\tAGGGAG\t6\t2\t12\n+hpat1\t688067\t688078\tTCCTGG\t6\t2\t12\n+hpat1\t689018\t689037\tAAAT\t4\t5\t20\n+hpat1\t690132\t690143\tTTTCTT\t6\t2\t12\n+hpat1\t690303\t690314\tAAGAGG\t6\t2\t12\n+hpat1\t693707\t693718\tAGGGGC\t6\t2\t12\n+hpat1\t694426\t694437\tTTTTGT\t6\t2\t12\n+hpat1\t694506\t694517\tCAGCCT\t6\t2\t12\n+hpat1\t694598\t694609\tTTTTTT\t6\t2\t12\n+hpat1\t694903\t694914\tTATTTA\t6\t2\t12\n+hpat1\t695265\t695276\tCAAGGC\t6\t2\t12\n+hpat1\t696496\t696507\tCGCCCC\t6\t2\t12\n+hpat1\t696718\t696729\tCGCCCC\t6\t2\t12\n+hpat1\t697067\t697078\tTGGGGG\t6\t2\t12\n+hpat1\t697568\t697589\tGT\t2\t11\t22\n+hpat1\t698582\t698593\tTCTGAT\t6\t2\t12\n+hpat1\t698951\t698962\tAATTGA\t6\t2\t12\n+hpat1\t699157\t699171\tTTT\t3\t5\t15\n'
b
diff -r 000000000000 -r 2b970db61912 test-data/dibed_sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/dibed_sample Sun Jul 21 07:19:00 2024 +0000
b
b'@@ -0,0 +1,17674 @@\n+hpat1\t75\t76\tGC_1\t2\n+hpat1\t203\t204\tGC_1\t2\n+hpat1\t307\t308\tGC_1\t2\n+hpat1\t553\t554\tGC_1\t2\n+hpat1\t559\t560\tGC_1\t2\n+hpat1\t567\t568\tGC_1\t2\n+hpat1\t575\t576\tGC_1\t2\n+hpat1\t601\t602\tGC_1\t2\n+hpat1\t627\t628\tGC_1\t2\n+hpat1\t633\t634\tGC_1\t2\n+hpat1\t645\t646\tGC_1\t2\n+hpat1\t725\t726\tGC_1\t2\n+hpat1\t755\t756\tGC_1\t2\n+hpat1\t759\t760\tGC_1\t2\n+hpat1\t765\t766\tGC_1\t2\n+hpat1\t777\t778\tGC_1\t2\n+hpat1\t831\t832\tGC_1\t2\n+hpat1\t877\t878\tGC_1\t2\n+hpat1\t885\t886\tGC_1\t2\n+hpat1\t961\t962\tGC_1\t2\n+hpat1\t1005\t1006\tGC_1\t2\n+hpat1\t1037\t1038\tGC_1\t2\n+hpat1\t1049\t1050\tGC_1\t2\n+hpat1\t1055\t1056\tGC_1\t2\n+hpat1\t1065\t1066\tGC_1\t2\n+hpat1\t1089\t1090\tGC_1\t2\n+hpat1\t1107\t1108\tGC_1\t2\n+hpat1\t1113\t1114\tGC_1\t2\n+hpat1\t1121\t1122\tGC_1\t2\n+hpat1\t1127\t1128\tGC_1\t2\n+hpat1\t1199\t1200\tGC_1\t2\n+hpat1\t1253\t1254\tGC_1\t2\n+hpat1\t1289\t1290\tGC_1\t2\n+hpat1\t1297\t1298\tGC_1\t2\n+hpat1\t1331\t1332\tGC_1\t2\n+hpat1\t1361\t1362\tGC_1\t2\n+hpat1\t1377\t1378\tGC_1\t2\n+hpat1\t1553\t1554\tGC_1\t2\n+hpat1\t1591\t1592\tGC_1\t2\n+hpat1\t1811\t1812\tGC_1\t2\n+hpat1\t1835\t1836\tGC_1\t2\n+hpat1\t1861\t1862\tGC_1\t2\n+hpat1\t1893\t1894\tGC_1\t2\n+hpat1\t1901\t1902\tGC_1\t2\n+hpat1\t1935\t1936\tGC_1\t2\n+hpat1\t1943\t1944\tGC_1\t2\n+hpat1\t2001\t2002\tGC_1\t2\n+hpat1\t2033\t2034\tGC_1\t2\n+hpat1\t2045\t2046\tGC_1\t2\n+hpat1\t2051\t2052\tGC_1\t2\n+hpat1\t2077\t2078\tGC_1\t2\n+hpat1\t2093\t2094\tGC_1\t2\n+hpat1\t2229\t2230\tGC_1\t2\n+hpat1\t2281\t2282\tGC_1\t2\n+hpat1\t2289\t2290\tGC_1\t2\n+hpat1\t2295\t2296\tGC_1\t2\n+hpat1\t2299\t2300\tGC_1\t2\n+hpat1\t2305\t2306\tGC_1\t2\n+hpat1\t2369\t2370\tGC_1\t2\n+hpat1\t2441\t2442\tGC_1\t2\n+hpat1\t2483\t2484\tGC_1\t2\n+hpat1\t2513\t2514\tGC_1\t2\n+hpat1\t2519\t2520\tGC_1\t2\n+hpat1\t2529\t2530\tGC_1\t2\n+hpat1\t2533\t2534\tGC_1\t2\n+hpat1\t2545\t2546\tGC_1\t2\n+hpat1\t2565\t2566\tGC_1\t2\n+hpat1\t2579\t2580\tGC_1\t2\n+hpat1\t2591\t2592\tGC_1\t2\n+hpat1\t2631\t2632\tGC_1\t2\n+hpat1\t2699\t2700\tGC_1\t2\n+hpat1\t2729\t2730\tGC_1\t2\n+hpat1\t2751\t2752\tGC_1\t2\n+hpat1\t2783\t2784\tGC_1\t2\n+hpat1\t2795\t2796\tGC_1\t2\n+hpat1\t2837\t2838\tGC_1\t2\n+hpat1\t2891\t2892\tGC_1\t2\n+hpat1\t2899\t2900\tGC_1\t2\n+hpat1\t2905\t2906\tGC_1\t2\n+hpat1\t2963\t2964\tGC_1\t2\n+hpat1\t3023\t3024\tGC_1\t2\n+hpat1\t3123\t3124\tGC_1\t2\n+hpat1\t3141\t3142\tGC_1\t2\n+hpat1\t3151\t3152\tGC_1\t2\n+hpat1\t3197\t3198\tGC_1\t2\n+hpat1\t3207\t3208\tGC_1\t2\n+hpat1\t3211\t3212\tGC_1\t2\n+hpat1\t3221\t3222\tGC_1\t2\n+hpat1\t3227\t3228\tGC_1\t2\n+hpat1\t3251\t3252\tGC_1\t2\n+hpat1\t3299\t3300\tGC_1\t2\n+hpat1\t3315\t3316\tGC_1\t2\n+hpat1\t3341\t3342\tGC_1\t2\n+hpat1\t3359\t3360\tGC_1\t2\n+hpat1\t3379\t3380\tGC_1\t2\n+hpat1\t3383\t3384\tGC_1\t2\n+hpat1\t3415\t3416\tGC_1\t2\n+hpat1\t3423\t3424\tGC_1\t2\n+hpat1\t3431\t3432\tGC_1\t2\n+hpat1\t3437\t3438\tGC_1\t2\n+hpat1\t3455\t3456\tGC_1\t2\n+hpat1\t3479\t3480\tGC_1\t2\n+hpat1\t3491\t3492\tGC_1\t2\n+hpat1\t3523\t3524\tGC_1\t2\n+hpat1\t3529\t3530\tGC_1\t2\n+hpat1\t3545\t3546\tGC_1\t2\n+hpat1\t3549\t3550\tGC_1\t2\n+hpat1\t3583\t3584\tGC_1\t2\n+hpat1\t3623\t3624\tGC_1\t2\n+hpat1\t3677\t3678\tGC_1\t2\n+hpat1\t3701\t3702\tGC_1\t2\n+hpat1\t3757\t3758\tGC_1\t2\n+hpat1\t3807\t3808\tGC_1\t2\n+hpat1\t3835\t3836\tGC_1\t2\n+hpat1\t3857\t3858\tGC_1\t2\n+hpat1\t3869\t3870\tGC_1\t2\n+hpat1\t3877\t3880\tGC_2\t4\n+hpat1\t3943\t3944\tGC_1\t2\n+hpat1\t3951\t3954\tGC_2\t4\n+hpat1\t3967\t3968\tGC_1\t2\n+hpat1\t4003\t4004\tGC_1\t2\n+hpat1\t4015\t4016\tGC_1\t2\n+hpat1\t4019\t4020\tGC_1\t2\n+hpat1\t4055\t4056\tGC_1\t2\n+hpat1\t4071\t4072\tGC_1\t2\n+hpat1\t4085\t4086\tGC_1\t2\n+hpat1\t4251\t4252\tGC_1\t2\n+hpat1\t4273\t4274\tGC_1\t2\n+hpat1\t4291\t4292\tGC_1\t2\n+hpat1\t4355\t4356\tGC_1\t2\n+hpat1\t4361\t4362\tGC_1\t2\n+hpat1\t4369\t4370\tGC_1\t2\n+hpat1\t4375\t4376\tGC_1\t2\n+hpat1\t4465\t4466\tGC_1\t2\n+hpat1\t4473\t4474\tGC_1\t2\n+hpat1\t4565\t4566\tGC_1\t2\n+hpat1\t4597\t4598\tGC_1\t2\n+hpat1\t4605\t4606\tGC_1\t2\n+hpat1\t4637\t4638\tGC_1\t2\n+hpat1\t4647\t4648\tGC_1\t2\n+hpat1\t4671\t4672\tGC_1\t2\n+hpat1\t4735\t4736\tGC_1\t2\n+hpat1\t4761\t4762\tGC_1\t2\n+hpat1\t4781\t4782\tGC_1\t2\n+hpat1\t4791\t4792\tGC_1\t2\n+hpat1\t4807\t4808\tGC_1\t2\n+hpat1\t4811\t4812\tGC_1\t2\n+hpat1\t4819\t4820\tGC_1\t2\n+hpat1\t4835\t4836\tGC_1\t2\n+hpat1\t4849\t4850\tGC_1\t2\n+hpat1\t4895\t4896\tGC_1\t2\n+hpat1\t4907\t4910\tGC_2\t4\n+hpat1\t4921\t4922\tGC_1\t2\n+hpat1\t4963\t4964\tGC_1\t2\n+hpat1\t4969\t4970\tGC_1\t2\n+hpat1\t4983\t4984\tGC_1\t2\n+hpat1\t5047\t5048\tGC_1\t2\n+hpat1\t5079\t5080\tGC_1\t2\n+hpat1\t5125\t5126\tGC_1\t2\n+hpat1\t5159\t5160\tGC_1\t2\n+hpat1\t5169\t5170\tGC_1\t2\n+hpat1\t5219\t5220\tGC_1\t2\n+hpat1\t5225\t5226\tGC_1\t2\n+hpat1\t5273\t5274\tGC_1\t2\n+hpat1\t5295\t5296\tGC_1\t2\n+hpat1\t5299\t5300\tGC_1\t2\n+hpat1\t5399\t5400\tGC_1\t2\n+hpat1\t5465\t5'..b'1\t695666\t695667\tGC_1\t2\n+hpat1\t695696\t695699\tGC_2\t4\n+hpat1\t695704\t695705\tGC_1\t2\n+hpat1\t695742\t695743\tGC_1\t2\n+hpat1\t695766\t695767\tGC_1\t2\n+hpat1\t695778\t695779\tGC_1\t2\n+hpat1\t695832\t695835\tGC_2\t4\n+hpat1\t695856\t695857\tGC_1\t2\n+hpat1\t695892\t695895\tGC_2\t4\n+hpat1\t695912\t695913\tGC_1\t2\n+hpat1\t695932\t695933\tGC_1\t2\n+hpat1\t695968\t695969\tGC_1\t2\n+hpat1\t696004\t696005\tGC_1\t2\n+hpat1\t696046\t696047\tGC_1\t2\n+hpat1\t696118\t696119\tGC_1\t2\n+hpat1\t696154\t696155\tGC_1\t2\n+hpat1\t696190\t696191\tGC_1\t2\n+hpat1\t696232\t696233\tGC_1\t2\n+hpat1\t696304\t696305\tGC_1\t2\n+hpat1\t696336\t696337\tGC_1\t2\n+hpat1\t696340\t696341\tGC_1\t2\n+hpat1\t696376\t696377\tGC_1\t2\n+hpat1\t696388\t696389\tGC_1\t2\n+hpat1\t696398\t696399\tGC_1\t2\n+hpat1\t696402\t696403\tGC_1\t2\n+hpat1\t696422\t696423\tGC_1\t2\n+hpat1\t696434\t696435\tGC_1\t2\n+hpat1\t696508\t696511\tGC_2\t4\n+hpat1\t696540\t696543\tGC_2\t4\n+hpat1\t696562\t696563\tGC_1\t2\n+hpat1\t696598\t696601\tGC_2\t4\n+hpat1\t696612\t696613\tGC_1\t2\n+hpat1\t696636\t696639\tGC_2\t4\n+hpat1\t696648\t696651\tGC_2\t4\n+hpat1\t696658\t696659\tGC_1\t2\n+hpat1\t696670\t696673\tGC_2\t4\n+hpat1\t696676\t696677\tGC_1\t2\n+hpat1\t696680\t696681\tGC_1\t2\n+hpat1\t696688\t696689\tGC_1\t2\n+hpat1\t696696\t696697\tGC_1\t2\n+hpat1\t696704\t696705\tGC_1\t2\n+hpat1\t696752\t696753\tGC_1\t2\n+hpat1\t696768\t696769\tGC_1\t2\n+hpat1\t696778\t696779\tGC_1\t2\n+hpat1\t696810\t696811\tGC_1\t2\n+hpat1\t696814\t696815\tGC_1\t2\n+hpat1\t696830\t696831\tGC_1\t2\n+hpat1\t696882\t696883\tGC_1\t2\n+hpat1\t696900\t696901\tGC_1\t2\n+hpat1\t696934\t696935\tGC_1\t2\n+hpat1\t696998\t696999\tGC_1\t2\n+hpat1\t697028\t697029\tGC_1\t2\n+hpat1\t697040\t697041\tGC_1\t2\n+hpat1\t697056\t697057\tGC_1\t2\n+hpat1\t697108\t697109\tGC_1\t2\n+hpat1\t697120\t697121\tGC_1\t2\n+hpat1\t697140\t697141\tGC_1\t2\n+hpat1\t697158\t697159\tGC_1\t2\n+hpat1\t697162\t697163\tGC_1\t2\n+hpat1\t697174\t697175\tGC_1\t2\n+hpat1\t697188\t697189\tGC_1\t2\n+hpat1\t697220\t697221\tGC_1\t2\n+hpat1\t697228\t697229\tGC_1\t2\n+hpat1\t697256\t697257\tGC_1\t2\n+hpat1\t697346\t697347\tGC_1\t2\n+hpat1\t697362\t697363\tGC_1\t2\n+hpat1\t697384\t697385\tGC_1\t2\n+hpat1\t697402\t697403\tGC_1\t2\n+hpat1\t697412\t697413\tGC_1\t2\n+hpat1\t697422\t697423\tGC_1\t2\n+hpat1\t697450\t697451\tGC_1\t2\n+hpat1\t697488\t697489\tGC_1\t2\n+hpat1\t697526\t697527\tGC_1\t2\n+hpat1\t697556\t697557\tGC_1\t2\n+hpat1\t697618\t697619\tGC_1\t2\n+hpat1\t697652\t697653\tGC_1\t2\n+hpat1\t697696\t697697\tGC_1\t2\n+hpat1\t697718\t697719\tGC_1\t2\n+hpat1\t697730\t697731\tGC_1\t2\n+hpat1\t697760\t697761\tGC_1\t2\n+hpat1\t697764\t697765\tGC_1\t2\n+hpat1\t697768\t697769\tGC_1\t2\n+hpat1\t697794\t697795\tGC_1\t2\n+hpat1\t697806\t697807\tGC_1\t2\n+hpat1\t697828\t697829\tGC_1\t2\n+hpat1\t697840\t697841\tGC_1\t2\n+hpat1\t697954\t697955\tGC_1\t2\n+hpat1\t697974\t697975\tGC_1\t2\n+hpat1\t698024\t698025\tGC_1\t2\n+hpat1\t698030\t698031\tGC_1\t2\n+hpat1\t698040\t698041\tGC_1\t2\n+hpat1\t698078\t698079\tGC_1\t2\n+hpat1\t698106\t698107\tGC_1\t2\n+hpat1\t698132\t698133\tGC_1\t2\n+hpat1\t698170\t698171\tGC_1\t2\n+hpat1\t698186\t698187\tGC_1\t2\n+hpat1\t698200\t698201\tGC_1\t2\n+hpat1\t698240\t698241\tGC_1\t2\n+hpat1\t698262\t698263\tGC_1\t2\n+hpat1\t698280\t698281\tGC_1\t2\n+hpat1\t698318\t698319\tGC_1\t2\n+hpat1\t698370\t698371\tGC_1\t2\n+hpat1\t698414\t698415\tGC_1\t2\n+hpat1\t698510\t698511\tGC_1\t2\n+hpat1\t698518\t698519\tGC_1\t2\n+hpat1\t698552\t698553\tGC_1\t2\n+hpat1\t698698\t698699\tGC_1\t2\n+hpat1\t698810\t698811\tGC_1\t2\n+hpat1\t698872\t698875\tGC_2\t4\n+hpat1\t699008\t699009\tGC_1\t2\n+hpat1\t699064\t699065\tGC_1\t2\n+hpat1\t699104\t699105\tGC_1\t2\n+hpat1\t699114\t699115\tGC_1\t2\n+hpat1\t699242\t699243\tGC_1\t2\n+hpat1\t699268\t699269\tGC_1\t2\n+hpat1\t699274\t699275\tGC_1\t2\n+hpat1\t699286\t699287\tGC_1\t2\n+hpat1\t699428\t699429\tGC_1\t2\n+hpat1\t699432\t699433\tGC_1\t2\n+hpat1\t699438\t699439\tGC_1\t2\n+hpat1\t699444\t699445\tGC_1\t2\n+hpat1\t699540\t699541\tGC_1\t2\n+hpat1\t699708\t699709\tGC_1\t2\n+hpat1\t699972\t699973\tGC_1\t2\n+hpat1\t700014\t700015\tGC_1\t2\n+hpat1\t700026\t700027\tGC_1\t2\n+hpat1\t700036\t700037\tGC_1\t2\n+hpat1\t700068\t700069\tGC_1\t2\n+hpat1\t700072\t700073\tGC_1\t2\n+hpat1\t700116\t700117\tGC_1\t2\n+hpat1\t700148\t700149\tGC_1\t2\n+hpat1\t700160\t700161\tGC_1\t2\n+hpat1\t700166\t700167\tGC_1\t2\n+hpat1\t700192\t700193\tGC_1\t2\n+hpat1\t700208\t700209\tGC_1\t2\n+hpat1\t700298\t700299\tGC_1\t2\n+hpat1\t700306\t700307\tGC_1\t2\n+hpat1\t700534\t700535\tGC_1\t2\n+hpat1\t700562\t700563\tGC_1\t2\n+hpat1\t700678\t700679\tGC_1\t2\n+hpat1\t700812\t700813\tGC_1\t2\n+hpat1\t700920\t700921\tGC_1\t2\n+hpat1\t700974\t700975\tGC_1\t2\n'
b
diff -r 000000000000 -r 2b970db61912 test-data/dibed_wig_sample
b
Binary file test-data/dibed_wig_sample has changed
b
diff -r 000000000000 -r 2b970db61912 test-data/humsamp.fa
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/humsamp.fa Sun Jul 21 07:19:00 2024 +0000
b
b'@@ -0,0 +1,11685 @@\n+>hpat1\n+AAAAGTGGGAGAAACACTAAAAGAACCTTAAGCATGTAACTAAAAATATTATGGAAATGT\n+TATTGAATTCATTAGCAAATTTAGTGCTAGGTTTTCATTGAGGAGTAGGTTATATTACTC\n+ATGATGAAGAAAAATGTTCATTTTAAGTATATTAACATAAATACCATCAATATTGTTTAT\n+CATGTTTAAATGTTCACTTAAAGCAATTCAGTTAAAATTCTGCATATCATACAATTTTAT\n+AGTTTGCTAGTAGGTTACAAGTAAATAGTCACCCAAATAAAAACATCATGTTTTCCACTG\n+GTTGTTGCTCTTTTTTAGGTGAGTATTTGATGTATACCAACAGAGAGAGGATAATAACAA\n+ATCGCTAATTTCTTTCATCACTATATAAAGGTGGCTTCAGGATAGAATAGTATCAGGGCA\n+ATGATGAATTTGAAATCTAACATCAATTCAGTGATGCATCAAGATAAAGTAGAGACAACA\n+GGGGCACCTTGGTGAGTACTGAACATTTTATTTATTTATTTATTTATTTTGAGATGGAGT\n+TTTGCTCTTTTTGCCCAGGCTACAGTGCAATGGTGCCAACCTCGCCTCACTGCAACCTCT\n+GCCTCCTGGGTTCAAGTGATTCTCCTGCCTTGGCCTCCCGAATAGCTGGGATTACAGACA\n+TGCGCCACCACACCCGTCTAATTTTGTATTTTTAGTAGAGACGGGGTTTCTCCATGTTGG\n+TCAGGCTGGTCTCGAACTCCCGACCTAGATATCTGCCTGCCTTGGCCTCCCAAAGTGCTG\n+GGATTACAGGTGTGAGCCACCGCGCCCAGATGAATTCCAAATTTAACAAAGCAGACTAAG\n+AGAAACAATTCATTTAAAAAAATAATATTTGGCCAGGCATGGTGGCTCACACCTATAATC\n+CCAGCACTTTGGGAGGCTGAGGTGAGTGGATCAGGAGGTCAGCAGTTCAAGACCAGCCTA\n+GCCAAGATCATGAAACCCCGTCTCTACTAAAAATACAAAAATCAGCCAGGCGTGGTGGCT\n+GGTGCCTGTAATCCTAGCTGCTCGGGAGGCTGAGGCAGAGAACTGCTTGAACCCGGGAGG\n+CGGAGGTTGCAGTGAGCCGAGATCGTGCCACTGCACTCCAGCCTGGGCAACAGAGTGAGG\n+CTCCGTCTCAAAAAAAATAAATAAATAATTCAATGAAATTCCTAAGATCCAGGGCTTTGC\n+AATAAATATGTAAATAAATTTCCAATCTCCATACTGAAAGTTTAAAAGAAATGCTAACTA\n+ATAACTAAAGAAATACAACTTTTCCTCAGCTTTGCAGCAATCTAGAAACAAAGTGTGTAG\n+ACACTACAAAGCACCTTACAAGAAGAAACATGTAAGGATGGCATGACTCGCCGGCAGCCC\n+TGGGATTGTCCACGGTACCCCCATGATGAACAGTAACTCCACTGTGTAAACGCCCATGAA\n+CATAAGATTACAAGACTTTTCCAGTTTAGACATACCATATTTTCTTTCAGACAATTCTTC\n+AGTTTGTTTACGTAGATCAGCGATACGATGATTCCATTTCTCTGAAAACCAAGCAAAAGT\n+TGCTTCTCAATAACACGTCCCTATGTCAGAGCAGCACTAACGTATAATGACTGATTTCAT\n+ATATTTTACATTCTAACAGTCCATATCATTTTACTGCTTTCAAGAAAAAATTTCCCCTTC\n+TTGGTGGTTCTTAGAATTGGTTTAATGGGAGACTATTAGAGAAGCTGAAAAGCAGGAGGG\n+CAGAAAAGTTCAATCAAATTAAACACAATAACAGGGAGGTCACAATGAGGCGGTCTCCAG\n+GGGTCTTTTAGCAAACTTCCTAAAACATGTCTCAGCTGTGTGAAATAAGACTTTACAGCA\n+GCCGGGTGCAGTGGTGCAGGCCTGTAATCCCAGCACTTTGGCAGCAGAGGCAGGCGGATC\n+ACTTTGAGCTCAGGGCAACATAGCCAAAACCCCCCTCCCTAGCCCCACCCCCACCCCGTC\n+CCTACCAAAAATACAAAACAGCAGGGCATGGTGGCGGGCGCCTGTAGTCCCAGCTACTCA\n+GGAGGCTGAGGCAGGAGAATCACCTGAACCCAGGAGGCAGACATTGCAGTGAGCCAAGAT\n+CACGCCACTGCCAGCCTGGATGACAGAGCAAGACTCCACCTCAAAAAAAACAAAAACAAA\n+AACACAAGGTTAAGAGGGACCCCCGACCTTACAGATACAAGTTTAAGAGGGACCCCTAAG\n+CAAAAAATGCCAACCCTTTTTCTCCCAATCATTGAAACACCAGGAGGGTGTAACAGTTTT\n+GCAGCCTAGCTGTAGCAGGCTGATGCCCCCAAGATGCCCATATCCTAATCCCGGGAACTG\n+GTGAACATGACCTTATATGGCAAAAGGAGCTTTGCAGATATAATGAAGTTAAGGGTCTTT\n+GGCTTTTGGGGTTGATGTACTCATTCGGATCCTTGTAAGAGCAGAGCAGGTGATGGAGAG\n+GGTGAGAGGTGTAGTGACAGAAGCAGGAAACTCCAGTCATTCGAGACGGGCAGCACAAGC\n+TGCGGAGTGCAGGCCACCTCTACGGCCAGGAAACGGATTCTCCCGCAGAGCCTCGGAAGC\n+TACCGACCCTGCTCCCACCTTGACTCAGTAGGACTTACTGTAGAATTCTGGCCTTCAGAC\n+CTGTAAGGGAATATATTTTGGTTGTTTTAAGTCACTAAGTGTGTGGTAATTTGTTGCAGC\n+AGCCACAGGAAACTAGTATTGTAGTGAAGCCTCAAAACCCCCCTGAAAGGGCTGGGCTCA\n+GTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCGAGATGGGTGGATCACTTGAGGT\n+CAGGAGTTCGAGACCAGCCCAGCCAACATGGTGAAATGCCATCTATACAAAAAATACAAA\n+AACTAGCCGGGCATGGTGGCACATGCCTGTAATCTCAGCTACTCAGGAGGCTGAGACAGG\n+AGAATTGTTTGAACCCAGGGGGGCAGAGGTTGCAGTGAACTGAGATTCCACCACTGCACT\n+CCAGCCTGGGTGACAGAGCGACGCTCCATCTCGAAAACAAAACAAAACAAAAAAACCCCA\n+CCTGAAGGTTTCCAGTTCTGCCAGCACTCTCCCACCTAACCCCCAGAAACAGACATTCCA\n+TTGCTGTGGGCCATGGACAGGCAGAAGGAAGCACCTCCTCATGGCAGAGGCCTACCCAGG\n+AGAAACCCAAGGGAAGGCACTACTGGGCTGGCCCCTCTCTGCCAAGGCCATATTCTTTTT\n+TTTTTTTGAGGCCAGTTTCACTCTGTCTCCCAGACTGGAGTGCAGGGGCACAATCTCGGC\n+TCACTTCGACCTCTGCCTCCCCAGTTCAAGTGATTCTCCTGCCTCAGTCTCCTGAGTAGC\n+TGGGATGACAGGAGTGTAGCATGCCTAGCTAATTTTTGTATTTCTAGTACAGATGCGGTT\n+TTGCCATGTTGCCCAGGCTGGACTCGAACTCCTTGCCTCAAGTAGTCCACCTGTCTCAGC\n+CCCGCAAAGTGCTGGTATTACAGGAGTGAGCCACTGCACCCAGCATTTGCCAAGACCTTT\n+GATGGCAGGCTTTTTCCAGGTGATCAGTCCTTGTCTGGTCTGGCTCTGCCCCACTCTCCT\n+TCTCACCTAGTTGGAATCCCTAGCTACTTTTCAGTAGAGGAGAGTGTGTACCCCAATCCC\n+AGCTTGGTTCAGATCTGCATTTAACTCATGGAACCTGGCTGCTCCCCAGGTTCTGAAGAA\n+AAAAAACGGTCTCTCTGTGGGTATGATAAAGGATGGGCCTGTCCCCAGGACCCTGTGAGA\n+GGGAAGCCCAATGTCCCACCAGGTTGGCAGGGCTGGGGAAGGGAAAGTGTTATGGCAGCC\n+CC'..b'CCAGCCTCAGAAAGGTAGCCTGTGATTCAG\n+GTAGTGTTGAAAACCAGAGGCCCAAAGGCAGGATACATCTGGGCCTATTTTAGAGGCCAA\n+TAGCGTGGTGGTTCAGAAAGGGGTTCCAACAGTTCTCCCTACTCGCCGTTGAGGATTTGC\n+GTCAGGTAACCACCCTGCGCCCGGGGCGGTGGAGGTGATGGGCTGAGACCGTGTTGGCGC\n+AGTGCTGGGTACACAGTGAGTGCTCACAAGTGCTGGTTCTTGCTGTTTTCTGTGATTTTC\n+CTTGGAGGGGCACAGGGAAAACCAAAAATGGCTGGTGCTGGAAAGAGGCCCTTGAGTTTC\n+ATCTCACAGCTGGCGGTAGGGTGAAGCTTGGCAAAAAACCTGTGGTCTGTGCTGAGCTTG\n+GGGCTGGGTGTGTGTGTGTGTGTGTGTGTCTATGGTCTTTGAGAAGTGTCTTCATCAGCA\n+TGTGACTTGTGTGAGTTTTTGGGTAGAGAAGGCACAGCCTTCACTAGTTTCTACAAGAAG\n+GGTGTGATCCAGTGTGCGAAGAATCCCTGTGTGCAGGGCTGGGAAATCAGCTTCATCTCT\n+TCCATCCAAGAACAGCATGGCTGGCACGCTTATTGTGTGCGGAGTCCACGGTAGCGTCCC\n+TAGGTGCTAGGGACAACCGTGAGCAGAGCAGAGTTCTTGGCTGCCTTCACAGACTCGCAG\n+TCTCCTTCGGAAGATGAATGAGAAATAAAATAAACATGTAAGACATGAGGTAGGTCACTA\n+GGTGACATGAGCTTTAGATAAAAACAAAGCAGGGCACCTGATTAGGAGGGTGGGCCCTAC\n+TTGCCAGACCCCTACCCTCATCCCCAATACAGTGTGTATCATTGCCTGGGCAGGAAAGGG\n+CCACACTCCTGGGTCTAGGGAACTAGGTCCCTAAGTAGCCTCATGTCCCCACCCACTAGG\n+ACCCAGCCTCCAGCCAGTTCTGTTGCATGGGGCAGGGTCCCTGGGAAGATGGGCAGGCCT\n+CAGACTGCTGCAACCTCTGCCCCTAGCCCTGAGGTGGTGGCAGTAGTGTTTCCCTCTGAT\n+AATGACATACTCCAGTCTTGCATGACCACGATGCCTCATGAGCCCACAGTCCAGAAGGGG\n+CCTGGTGAGCAGGAGCTGAATTAAAATGGAAAATACGGCCTCCCTTCCCCCCTTCCTGTT\n+CCTCCCCACTTACTCCCTTCATTCCTGCTGCTCCTGAAGCCATGGGTGAATAGTTTCTCA\n+GTCTCTTTGCCTTGCTGAGCTGGGTTAGTTGGGTTAGCAGGATGGACATCTCCTTATGGA\n+TACTAGAGTTAGGAGACACCCAGAACCAGCAGGTGGGATTTTAGAAAGTGCTGTTTGGCT\n+CTGACCAACCCCTACTCCCACCCCATGCATAGCAGGGTTAGTTTCCTCATCAACTCTGGT\n+CTCTGATTCTGATGTTCTGCCCCTAAGCATCCTACATTCTAACATTGTATTCTTCTGACT\n+TTTTAGAATTTTCCCATCCTATGCATCTTACACGAATATGGTGAAGTTCTGATTTCCGCC\n+CTTATGTTCTAGATTGAGTCTTCAGTATTAATTTTGTTGGGGTCTACCCATTCAAATAAC\n+AATAGAGAAAGACTATGTGCAGAGTCACATAATACTAGTCATGATAGTAGCAATTAACAT\n+CCTTTTGACCTTGGTTTTATTTAACAGTTTTTTTTTTTGAGAATTTTCTAAGCGCCAGAC\n+ACCATTCTAGACATTATACAAATAAATAGATAAAAATGAATAAATAAATAAATGTTATTT\n+TAGAAGGTGCAATTGAAATTGAGAGTCAAGGCAGGACTCACTAAGAAAGTGACAATATTT\n+TCTTTCTGCTTTCAGTTATGGAGATTTGAAATCTCAGTTGATAAGGTGGATTTATTTCTC\n+CTTGCAGGTCATTCACTTTTTCCTTCATGTATTTTGAATCTCTGCTATTGGGGGCATAAA\n+TATTTAGAATTGTTATATGCTCATGATTAACTGACCTTTTTTTTTTTTTTTTAGATGGAG\n+AGTCTTGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCGACTCACTGCAACCTC\n+CGCCTCCTGGGCTCAAGAGATTCTCTGGCCTCAGCCTCCTGAGTAGCTGGGATTACGGGT\n+GTGCACCACCACGCCCAGTTAATATTTGTATTTTTAGTAGAGATGGGGTTTCACCATGTT\n+GGTCAGGCTGGTCTCAAACTCCTGACCTTGTGATCCACCTGCCTTGGCTTCCCAAAGTGC\n+TGGGATTGCAGGCGTGAGCCACCGCACCCAGCCTTGACCTTTTTAATCATTATAAAATGA\n+CCTTATCTTTCCTGGTAATATTCTTTGCTCTGAAACAAACATTGTCTGATATGAAACTAG\n+CTACTCCTCCAGTGTTATTTTGATTACTGTTACTTTCCATCTTTGTACTTTTAACTTGTG\n+TGTTTACACTTAAGGTGTGCCGTTTTGAAGGTTGGGTCTTATTTTTTTTTTAAAAAAAAA\n+CAATATTTGGGTCTTATTTTTTAAAAAAATCCAATCTAACAATCTCTGCTTTTTAATTGA\n+GGGTATTTAGACCATTTACATTTGATGTGATCATTGACGTGGTTAGATTTAAGTCTATCA\n+TCTTGCTATTTGTTTTATTTTTGTCCCATCTGTTCTTTGCTTCCTTTTTCTTCTTTTTCT\n+GTCTTCTTTTGTATAAGCTGAGTTTTTTTTACTATTTCATTTTAACTCCTCCTTTTTTAT\n+TATATCGTTTTACCTTCTCTGTTGGTTTATTAACTGTAATGTTTTGTTTGTTATCTTAGT\n+AGTTACATTAGGCCTTATAAGCCTGGGTACGGTGGCTCACACCTGTAATCCCAGCACTTT\n+GGGAGGCTGAGGTGGGCGGATCACCTGAGGTCAGGAGTTTGAGACCAGCTTGCCAATATA\n+GTGAACCCCTGTCTCTACTAAAAATACAAAAATTAGCCAGGCATGGTGGTGCATGCTTAC\n+AGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATTGCTTGAACCTGGGAGGCAGAGGTT\n+GCAGTGAGCTGAGATAGCACTACTACACTCCAGCCTGGGTGACAGAGCAAGACTCAAAAA\n+GAAAATCATGTAATAGTATACTTTCATTTCCTTCCTGGCCTTCTTGCTGTTGTTGTCATA\n+CATACATATGTGTGACAGACTCCACAAAATGTTATTATTTTTGTTTAAATGCTAAGTTAT\n+CTTTTTTTAAATTAAATAATCAGAAAAGATTTTATATATTTAACTCATGTAGTTAACACT\n+TCTGGTGACCTCCATTCCCTTATGTAGATCCAGATTTCCATCTGGTATCATTTTCCTTCT\n+GCCTGAAGGATTTCCTTTATCACTTCTTGCAGTGCAGATCTGTTGGTGACAAATGCATTC\n+AGCTTTTGTATGTCTGAAATCATCTTTATTTCATCTTCATTTTTAAAAGATATTTTAACT\n+GGGTATAGAATTCTAGATCGGCAGGTTTTTTCTTTCATTAGCTTAAAAGATGTTGTTGCT\n+TCACTATTTTCTTCCTTAAATTGTGTCCAACAAGAAATCTGCCATTATCCTTATCTTTGT\n+TTCTCTATACTTTACAAGGCTTTTCTTTCTCTGACTGTTTTTAATATTTATCTGTTTGTC\n+ACTGGTTTTGAGCAATTCGTTCATGGTGGGTAATTTCCTTCCTGTTTCTTGTGCTTGAGG\n+TTCGTTAAGCTTCTTGGATCTGTGAGTTTATAGTTTATATCAGATATGAAATATTCTCAG\n+CCATTAATTTTCTATTTCCTCTCCTTTGGGGACTTCAAAGACACATATATTAGGCTTCTT\n+GAAGTTGTTCCATAGCTCCTAATACTCTTGGTTTTTTGGATTCTTTCTTTTTCTCTGAGT\n'
b
diff -r 000000000000 -r 2b970db61912 test-data/mouse.fa
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/mouse.fa Sun Jul 21 07:19:00 2024 +0000
b
b"@@ -0,0 +1,5421 @@\n+>mm10_knownGene_uc008xda.1 range=chr5:34761740-34814709 5'pad=0 3'pad=0 strand=+ repeatMasking=none\n+GCACTCGCCGCGAGGGTTGCCGGGACGGGCCCAAGATGGCTGAGCGCCTT\n+GGTTCCGCTTCTGCCTGCCGCGCAGAGCCCCATTCATTGCCTTGCTGCTA\n+AGTGGCGCCGCGTAGTGCCAGTAGGCTCCAAGTCTTCAGGGTCTGTCCCA\n+TCGGGCAGGAAGCCGTCATGGCAACCCTGGAAAAGCTGATGAAGGCTTTC\n+GAGTCGCTCAAGTCGTTTCAGCAGCAACAGCAGCAGCAGCCACCGCCGCA\n+GGCGCCGCCGCCACCGCCGCCGCCGCCTCCGCCTCAACCCCCTCAGCCGC\n+CGCCTCAGGGGCAGCCGCCGCCGCCACCACCGCCGCTGCCAGGTCCGGCA\n+GAGGAACCGCTGCACCGACCGTGAGTCCGGGCGCCGCAGCTCCCGCCCGG\n+GCCCCGCGCCCCTGGCCTGCGTGCTGGGCATGGCCAACACTGTTCCCTGT\n+CCAGAGGGTCGCGGTACCTCCCTGAGGCCAGGCTTTCCCGGCCCGGGCCC\n+TCGTCTTGCGGGGTCTCTGGCCTCCCTCAGAGGAGACAGAGCCGGGTCAG\n+GCCAGCCAGGGACTCGCTGAGGGGCGTCACGACTCCAGTGCCTTCGCCGT\n+TCCCAGTTTGCGAAGTTAGGGAACGAACTTGTTTCTCTCTTCTGGAGAAA\n+CTGGGGCGGTGGCGCACATGACTGTTGTGAAGAGAACTTGGAGAGGCAGA\n+GATCTCTAGGGTTACCTCCTCATCAGGCCTAAGAGCTGGGAGTGCAGGAC\n+AGCGTGAGAGATGTGCGGGTAGTGGATGACATAATGCTTTTAGGAGGTCT\n+CGGCGGGAGTGCTGAGGGCGGGGGAGTGTGAACGCATCCAATGGGATATT\n+CTTTTTCCAAGTGACACTTGAAGCAGCCTGTGACTCGAGGCACTTCGTAC\n+TCTCCTGGCGTTTCATTTAGTTTGTGGTGTAGTGTAGTTAAACCAGGTTT\n+TAAGCATAGCCAGAGAGGTGTGCTTCTGTGTGTCTGCAGGCAGTTGGATG\n+AGTTGTATTTGTCAAGTACATGGTGAGTTACTTAGGTGTGATTATTAATA\n+AAAAACTATATGTGTGCATATATATGAAAGAGTCGACTTATACTTAACTG\n+CCTATCGATTTTTTGTTCTATATAAAACGGATACATTGGTGGTGCTCAGT\n+TTTCACCGGGGAATGAATTTTACTAGTGTTGCAGACAGGCTTGTTTTAGA\n+ACATAGGCCACTCTGACTCTGACTTTGTGCCAGTAAAAGTTCCTGTTTAG\n+TTCTTTGCTGACATCTTATAGATCTTTGGAAGCTAGCTGCTTGTGACTGG\n+AGAGAATATTGAAACAGAAGAGAGACCATGAGTCACAGTGCTCTAAGAGA\n+AAAGAGACGCTCAAAACATTTCCTGGAAATCCATGCTGAGTGTTGAGCCC\n+TGTGCTCTCTTGCAGCTCAGTCCTTTCTCTCAACTCTGGGCATTTTATTT\n+CTAATCTGGATTTGTATAATTAATAAGGAGAACTTTTGGGAACAACCTAC\n+TAAAGAATGTCATCATTAAAACTCACTTAGAAAATAAGTGTTCTGGTGAT\n+ATCATTGAGCTATGTTCCCAGTCCTGAGAGTTTGTTTTTTTTTTTTTTTT\n+TAAATAAAGATTTGGGGAGAAAAGGTGGCTTACTTGATAGAACAAAATAT\n+AGGAATAAAATTTCCTTCTATAAGGTGAAAAGTGTGAATAGAAAACTTCT\n+TATCCTCTAGATAAGTAGTTTCTTTTTGCTTTTGAGAGTCTCACTATGTA\n+ACTCTTGACCTGAACTCAGAGAGATCCATCCTCCTGCCTCTGCCTCCTCT\n+CTCTGGGATTAAAGGCATGTGGCACCATGCTGGGCTGTCCAAGTATGCCA\n+CAGACCCTCTAGGTCCCTGGTCTTCGAGGAACGGGATTTCTTAGGCAGAT\n+GGGTAAGGAGTCGGATGAAAATGACAATCAGCCACACACAAGAGAGGTGT\n+TGAATCTGAATGTAATGTTCTGGTTGAGCTTCAGACTTATATAACAACGA\n+ATTATCAGAGGATACAAATCACAAAAAGACAAGATACACTGAAATTCACC\n+AGTTACAGCAGAAAGGAATTTGCAGGGACTAATTAAATGTTTACATTAGG\n+GATAACAAGCCCTGCCTAGGATCAGCCTAATGCCAGGCAAGAATTTCACA\n+CTTTAAGGTTAAAAGCATCAGGGGGTTGTTAACTCTTGACAGGCCTTAAG\n+AGTAATGTGCTATCACTGAGCTCTAAATTCTTAGGTCTAGTAAAACTTAT\n+CCTGTCTGGAGAGTTCCCCCTTATCAGGGTAGTATATCAACTTATACTTG\n+ACATGGAATGAAGCCTGTAGTAAAACATTTCTATCTCAGTGAGACTTTTA\n+GTCTCTATCTGTAAACAGCTGAGTAAAATGGCAAGTGCTTAATTGTTTAC\n+TGAATGGGTTAAGCTCCTTGCTGCTATCTGGAATCTAAGAACACTGGGGA\n+AAGGCTTTAGCTATGTTAGAATACAATATTAAAAGGCATTTACTATAAGG\n+TGATGCTTAATAGAGTGCACGTGAATCTATACACTAGATTAATGTGGTGG\n+AAATTTGAATATAATGGGTTAGGGAAAGAGATGCCATAACTCTGGGAGGA\n+AAATTTCCCTGGACTCTTATCCTCGTGAAACAGCTTCCAGGCTTTTCGCC\n+TGACAAACCGATCCAAACTGGAGAGTTGGCTTTCGCCAGAATATCCAGGA\n+GGAGAGTCCTAGAAATTCATTTCTCATGAGCAGCTTTTTGGCATTTTTGC\n+CTCACAAGCTGACTCCACCAGAGTACCCTGACACAAGTATTGTCTAGTTA\n+TTTTGATTATTACCATGACTCTGCCTCTGGGTGAGAGGAATTGTGGAAGT\n+TTACATATTCCCCATATCTTCTATAAACCTCTGTGTGTGTGTGTGTGTGT\n+GTGTGTGTGTGTGTGTGTGTGTGTATGAGGGAGAGAGAGGGAGAGAGAGA\n+GAGGGAGGGAGGAAGAGAGAGAGAGAGATTGTTCTGTGCCTGCTTCGAAC\n+ACAAATTAGTTTGCAAAAGTAATTCATTAACATGATACAGTCCCAAAGAT\n+AAAAATGGTTAAATAATGAAAACATCTCCCTCCCCATTTTCCTAACTTTG\n+TACCCAGGAGCAAGCTCTGTTACACTTCATTTGTCCTTCCAGATAAAATT\n+TGGGCATATGTTAGGACAGAATTTTAAATTATTTACAAACAAAAGTATTT\n+TGGAACAAAAGCTTTTAAAAGCTTTTATTTTAATAAAATAACTTGTTACT\n+ACACTGTATATAACTAACTAACATTTTCCAAAATTAGCTCCATTAGCATC\n+TATCTCATATTTCTATGTACTTTGCTGTTGAAAAACCAAGTGTTCATTAA\n+TAATAAGTAACAAACTCACTGCTTGGAAGCTTTGATTTTTGGCATTTTGT\n+CCACTTGACTCAGTTAAAAGTCCTTTTTTTCGAAATGAGAACAGCCAAAA\n+CAGTTTTAGAATGAGTCTGTTCTGCTTTTGTGACTCTCATTGTGTTCTGT\n+AGAACCAGTGTCACAGCCATATGTGGGCCTCTGTTGAAGTAGCTGAGAAC\n+TTGTTCTCTGCTCTGCTAGCTGCTGTCGATCTGATAGGCCTTGAACAGTT\n+GACATTCACCCTTAATAGTCCTCATTAGTCTTCCTGAGCATAGTCATTCA\n+TTTATCAATATTTGCTGATCATCTCCTATGTGCCTAGCATTGTTCTAGTT\n+GCAGGTTTTAGCAGGGAACAAAGTCATGTC"..b'ACACCAAGAGGTTATTATCTCCTCAGCAGTTGTAGAACTG\n+CTACGCTGACATCAGTATTAACCATTACACCCATCATGTAGTGAGGCACC\n+TTGTCCCTGTAGATAAAGAGGCATTCTGTCATGTAGTGAGGTACCCCGTC\n+CTCTCTAGATATAGAGGAATTACCTCATGTAGTCAGATGCCCTGTACTGT\n+CTAGATACAGAGCAATTCTCCTCCACTTACCCCTCGAATACCAGAAAGCA\n+TACTGAGAGCTGGTGCAGGCCTTGAAAGCATTCAATTCCCTTCCTTGTCT\n+TCTTTGCCAAGCACTCTTAGGCCACTACCTTAGTGGGGTTCTTTGTTGCC\n+CAGTGAAGACAAGGACCTCATTGCCCCTTGATACATGCCAAATGGTTATG\n+GGGAAGCAGGAACTGAGCAGGTTAATAGAAGGTGTGTGTGTTGTGGAGAG\n+AGAGGGTTCTCACATAGGAAGATATCTAAAGCACAGGACCCAGTTTGTTA\n+TATTTTCCAAGTCGTTAGGTGGACTATTAGCAGCTTGCAAGTTCCATCCA\n+TGACCATAGAAATGTTTGATTTGGGGGAACTAATGATGAAATACAGTGTT\n+TAATATTAAAGCTTATGTTCTACTTGAAAAAATTGTGACTCTCTCTAAAT\n+CCTTAAATGGCTTAAAATAAGTTTTTGACAAAACATAATAAAAACTGTCA\n+TATGAGGCCAGACATGGTAGTGCATATCTTTAATTCCAATACTTGGGAGT\n+CAGACACTTGAGGATCTCTGTTTGTGACATGTCTGGTTGACTTAAGTTCC\n+AGGCCAGCCAGGGCTACATAGTAAGACTATCTCCAAATCAAAAAAAAAAA\n+AAGAAAATTAAAAGTTTTTGGCATGTGAAATGTTGTGTGTGTGTTTTTTT\n+AAGCAGATTTTTGTCTAATATAAGATGCTCTGTGTGCCTTCTCAGGCTGC\n+AGCATTGCTTGGCATCCCACTGGATTCTTAGATGGCATATTAAACTTGGT\n+GCGCTGTCTACATCAATTAAGATTTGTCATCCTAGAATTATTTCAATGAA\n+ATATAAGATCATAAAAATTAAAAATATTGCTCTTTCTCTCTTTCCCTCCC\n+CCCTCTCTCCACGTGGCCATGGCCAGTCTCTCTCTCTTTCTACCTTCTCT\n+CCTTTCTCCCTGACTTTCTACAATAAAGCTCTAAAACCATTTTAAAAAAT\n+TAAAAATATTACTTTAAAATTCAAATATGACAGTGACCAGAAATATTTAT\n+TAAGCATGTTAAGTGGAGTTGTTGATATATTTATTAATATATATAACATA\n+GGATATACTTTTTAAAATAGAGAATTCAACTTAGTTTTATCTGTCTTTTA\n+ACTTTATTTGTAGTCTAAGATCTTTTCTAGAGAGTATTTCCCACTTTTAT\n+TATTATAAGTTACTTGAGACAAGCTACATCATAAGAGAAAAAGATTTATT\n+TTGACTGATAGTTCTGCACATACAACATCCAAGGGCTCATCTGGTGATGA\n+CTTTACTGTCAGAGTCCCAGTGTGGTGCAGAAAACCTCCCATGGCAAACA\n+ATAAGGAGCTTGAGTGTCTCTGTTTCTAGAATATTCTCAGAAGCATTCCT\n+TACAGTTCTTTGGTCTGGATTATCTCAGAAACAAATGCTTATTGCATTAA\n+CTGTGTGTGTTCCAGCCTGAAGGAAAGCTTACTGTCTTTGCTGTTGTTTG\n+TCTTGCATGTAAACTTCTGACCCAGGAGTTCAACCTAAGCCTTTTGGCTC\n+CCTGTTTAAGCCTTGGCATGAGCGAGATTGCTAATGGCCAAAAGAGTCCC\n+CTCTTTGAAGCAGCCCGTGGGGTGATTCTGAACCGGGTGACCAGTGTTGT\n+TCAGCAGCTTCCTGCTGTCCATCAAGTCTTCCAGCCCTTCCTGCCTATAG\n+AGCCCACGGCCTACTGGAACAAGTTGAATGATCTGCTTGGTAATTAAATA\n+CAGTTCCCTTGGATGCTTGTCTGTCTATCTTCTCTGTCACTCTGTCTCTC\n+TTTATGGGTGATAGGAATGGCAGTAGCAGAATGGACAAGCCAGAGGGACA\n+CTGAGTCACACATTGAACCTAGAGCTGCCAACTCTGGTAGATCAGCTGAC\n+CAAGCCTCTAGGACCCTCCTGTCTCAGCCCTAAGTGCTGAGGTTACAGGT\n+GTACACCCACAGCCAGGTTTTACATAAGATCTTAAATTCCAAACTCAAGT\n+CCTCATGCTTGCACAGGAAGCACTTATCCACCGACTCATCCTCTCAGCTC\n+AAGTTATCTTAGTGTTTTAGTTATTTTATATTATGTTATAGTTGTCTGCA\n+TGTATGTCTTCACCAGATGCATACAGTGCCCATGGAGACAGAAGATGGCA\n+TCAAATCCCATGGGACTGGAATTACAGGTGGCTGTGAACACACTATGTGG\n+CAGCTAGAGATTTAACTTAGGTCCTCTGGAAGTGCAGCTCATGCTCTCAG\n+CTCCCGAGCTGTCTATCTAGCTACAAGTTGTCACCGTTTTTAAAAGTATT\n+ACAGATTCAGCACCGTGCTTTTCCTCAAGCACGCATATAGTCAGGACTGT\n+TGATCTAAAAGGCTGACAAAAATAGCTGAGAAACTGCACCAAATCCTTAG\n+CTCTAAACTTCTTTCTTTGTTGCTTGACCTGGACATAGAAAGTCAGGTTC\n+TAAGCCCTTCAGGATCAGTGGGTTAGACTCAGGGCAAACCATGTCCTGAC\n+TTTATGTAGCACGTATGAGTGAGCATGTACAGATGTGCTTGCTCTCTTGG\n+TCTTGGCAACCTCAAATTCACATAGTTGTGTGAAGGCTTCTGAAGGGGCG\n+GGCCTGTGCTCACAGTCAAAGTCACTCATGTCAGTCTCATGTTTCAGGTG\n+ATACCACATCATACCAGTCTCTGACCATACTTGCCCGTGCCCTGGCACAG\n+TACCTGGTGGTGCTCTCCAAAGTGCCTGCTCATTTGCACCTTCCTCCTGA\n+GAAGGAGGGGGACACGGTGAAGTTTGTGGTAATGACAGTTGAGGTAAGAG\n+CAGCTCTGAAATTATGTGTCCCTGTGAGGACAGGATATGTGAGTAGCACT\n+AAGATGAAAGTCCTTGAAAACCGACAGTGTGGAGTACAATAGTGCACACA\n+TTAGCCCAGCTGCCTTGGAGGCAGAGGCAGAATTGTGGGTTCCTGGTCTG\n+TAAGGATGTGCCTGAGTATACAGCTAGACCCTATTTGAATAAACAGGAAG\n+GCAGGGAATACCTATTGGCAAAGTCTGATTCACCTGATGGTACAGAGTGC\n+CTTTCACCCTCACCACTGGGAAGCAAGGAGGTCTGTAAGACATCCTGTTA\n+TCCCTACACTATAAACCTAATGTGGGTCCTAAATAAAATCTAGACAGTGT\n+TACATTTTAAATTGGGCAGTGAAGCTGGACATTTCACCCAGAAACACTTG\n+GCCCCTCAAAATGTATCTATACGTGCACTATAGTTTTATTACCTTGCCAT\n+GGGCATGCTGGGAAAGAGCCTCACTGTGCCAGAGCTGTGCTGCCAATCCT\n+GAACAAGGGTTGACACCTTACCCTAAGAGAAGAAAGTCAGTATCCTGAGG\n+GTGTATGGTACAAAGGCACCAGGTGAACCAGGCTAAGTTAGGTGGTCTTT\n+GAGCTTGTCTTAGCCCAGTGAAGACAGGAAAGCAAATGTGTGTGTAAAGT\n+ATTGGGTGGCAGCTCCTAGTCATACTCTGCGCTGCACAGGCCATGCCATG\n+ACACTTGTTTCCTATAAAAACTCTGTCCCCATTTCACACATGGGGAAAGA\n+AGCTCAGAGAGGTTCGGGGACTTGCTAGAAGTCACTAGTCATAAATCATA\n+CTCCAAAACTCAGTGTTGTGACTGAGATACAAAACAAAACACATTCTGTT\n+TCTT\n'
b
diff -r 000000000000 -r 2b970db61912 test-data/nativegff_sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/nativegff_sample Sun Jul 21 07:19:00 2024 +0000
b
b'@@ -0,0 +1,418 @@\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t228\t239\t.\t+\t.\tMotif=CAGCAG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t264\t275\t.\t+\t.\tMotif=CCGCCG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t314\t325\t.\t+\t.\tMotif=GCCGCC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t1210\t1221\t.\t+\t.\tMotif=ACTCTG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t1585\t1599\t.\t+\t.\tMotif=TTT;Type=3;Repeat=5;Length=15\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t1784\t1795\t.\t+\t.\tMotif=CTGCCT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t2882\t2923\t.\t+\t.\tMotif=TG;Type=2;Repeat=21;Length=42\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t2941\t2952\t.\t+\t.\tMotif=GAGAGA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t2964\t2975\t.\t+\t.\tMotif=AGAGAG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t4224\t4235\t.\t+\t.\tMotif=AAGTCT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t4334\t4345\t.\t+\t.\tMotif=TTGTTG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t4346\t4365\t.\t+\t.\tMotif=TTTGT;Type=5;Repeat=4;Length=20\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t4447\t4458\t.\t+\t.\tMotif=CTGCTT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t6208\t6223\t.\t+\t.\tMotif=TTTG;Type=4;Repeat=4;Length=16\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t6226\t6241\t.\t+\t.\tMotif=TTTG;Type=4;Repeat=4;Length=16\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t6787\t6798\t.\t+\t.\tMotif=TGCCTC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t7487\t7498\t.\t+\t.\tMotif=TATATA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t7518\t7557\t.\t+\t.\tMotif=AC;Type=2;Repeat=20;Length=40\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t8248\t8259\t.\t+\t.\tMotif=ATTGTA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t8375\t8386\t.\t+\t.\tMotif=TTTTTC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t8468\t8479\t.\t+\t.\tMotif=CTGCCT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t9129\t9140\t.\t+\t.\tMotif=CCCACT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t11343\t11354\t.\t+\t.\tMotif=CTGCAT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t11449\t11460\t.\t+\t.\tMotif=ACACAC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t11712\t11723\t.\t+\t.\tMotif=ACATCA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t11829\t11840\t.\t+\t.\tMotif=CTCTTC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t11891\t11902\t.\t+\t.\tMotif=TTCTCT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t12162\t12173\t.\t+\t.\tMotif=TCTCTC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t12268\t12279\t.\t+\t.\tMotif=CTGCCT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t13429\t13440\t.\t+\t.\tMotif=GAGGAA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t13515\t13526\t.\t+\t.\tMotif=ATTTTT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t14835\t14846\t.\t+\t.\tMotif=ATTTTC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t15618\t15632\t.\t+\t.\tMotif=TTG;Type=3;Repeat=5;Length=15\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t16489\t16500\t.\t+\t.\tMotif=AGAGAG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t16505\t16516\t.\t+\t.\tMotif=AGAGAG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t16521\t16536\t.\t+\t.\tMotif=AGAG;Type=4;Repeat=4;Length=16\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t16547\t16564\t.\t+\t.\tMotif=AGACAG;Type=6;Repeat=3;Length=18\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t16716\t16731\t.\t+\t.\tMotif=ATAT;Type=4;Repeat=4;Length=16\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t16732\t16743\t.\t+\t.\tMotif=ACACAC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t17779\t17790\t.\t+\t.\tMotif=CCTTCA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t18177\t18188\t.\t+\t.\tMotif=TTTTTT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t21240\t21251\t.\t+\t.\tMotif=TTTTTG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xda.1\tpytrf\tETR\t21349\t21360\t.\t+\t.\tMotif=GAGGCA;Type=6;Repeat=2;Length=12\n+mm10_know'..b'TTGG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t122503\t122514\t.\t+\t.\tMotif=GAGGCA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t123241\t123255\t.\t+\t.\tMotif=AAA;Type=3;Repeat=5;Length=15\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t126605\t126616\t.\t+\t.\tMotif=AAAAAA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t126627\t126644\t.\t+\t.\tMotif=TTG;Type=3;Repeat=6;Length=18\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t126645\t126674\t.\t+\t.\tMotif=TTTTG;Type=5;Repeat=6;Length=30\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t126763\t126774\t.\t+\t.\tMotif=CTGCCT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t128142\t128153\t.\t+\t.\tMotif=AGAGAG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t129691\t129702\t.\t+\t.\tMotif=AGGAGA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t130429\t130449\t.\t+\t.\tMotif=TGC;Type=3;Repeat=7;Length=21\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t130455\t130466\t.\t+\t.\tMotif=ATTATT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t130487\t130498\t.\t+\t.\tMotif=TCCTCC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t130561\t130572\t.\t+\t.\tMotif=TTTTGT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t130941\t130952\t.\t+\t.\tMotif=GAGATA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t132006\t132026\t.\t+\t.\tMotif=T;Type=1;Repeat=21;Length=21\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t133047\t133062\t.\t+\t.\tMotif=TTGT;Type=4;Repeat=4;Length=16\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t133906\t133917\t.\t+\t.\tMotif=AGATAC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t134280\t134291\t.\t+\t.\tMotif=TACAGG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t135294\t135305\t.\t+\t.\tMotif=TTGAGC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t136933\t136954\t.\t+\t.\tMotif=T;Type=1;Repeat=22;Length=22\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t137254\t137265\t.\t+\t.\tMotif=ACACAC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t137576\t137587\t.\t+\t.\tMotif=CAGATC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t141865\t141876\t.\t+\t.\tMotif=GAGGCA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t141972\t142039\t.\t+\t.\tMotif=AAAG;Type=4;Repeat=17;Length=68\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t142094\t142153\t.\t+\t.\tMotif=AAGG;Type=4;Repeat=15;Length=60\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t145751\t145762\t.\t+\t.\tMotif=CTGTCC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t145787\t145798\t.\t+\t.\tMotif=GCCATG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t145999\t146010\t.\t+\t.\tMotif=GAGGCA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t146206\t146217\t.\t+\t.\tMotif=AGGGAA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t147819\t147830\t.\t+\t.\tMotif=GCACTT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t148513\t148524\t.\t+\t.\tMotif=AAGCCC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t149700\t149711\t.\t+\t.\tMotif=TGTTTC;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t150175\t150198\t.\t+\t.\tMotif=TCCC;Type=4;Repeat=6;Length=24\n+mm10_knownGene_uc008xdc.2\tpytrf\tETR\t150203\t150218\t.\t+\t.\tMotif=TCCC;Type=4;Repeat=4;Length=16\n+mm10_knownGene_uc008xdd.1\tpytrf\tETR\t943\t954\t.\t+\t.\tMotif=AGTGTA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdd.1\tpytrf\tETR\t1872\t1883\t.\t+\t.\tMotif=GGATGG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdd.1\tpytrf\tETR\t2357\t2368\t.\t+\t.\tMotif=GCCTTT;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdd.1\tpytrf\tETR\t2462\t2473\t.\t+\t.\tMotif=CTTTGG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdd.1\tpytrf\tETR\t4088\t4099\t.\t+\t.\tMotif=ACCTCA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdd.1\tpytrf\tETR\t4940\t4951\t.\t+\t.\tMotif=AAAAAA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdd.1\tpytrf\tETR\t5214\t5225\t.\t+\t.\tMotif=TGGCCA;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdd.1\tpytrf\tETR\t6845\t6856\t.\t+\t.\tMotif=TCTTGG;Type=6;Repeat=2;Length=12\n+mm10_knownGene_uc008xdd.1\tpytrf\tETR\t7218\t7229\t.\t+\t.\tMotif=GAGGCA;Type=6;Repeat=2;Length=12\n'
b
diff -r 000000000000 -r 2b970db61912 tool-data/all_fasta.loc.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool-data/all_fasta.loc.sample Sun Jul 21 07:19:00 2024 +0000
b
@@ -0,0 +1,18 @@
+#This file lists the locations and dbkeys of all the fasta files
+#under the "genome" directory (a directory that contains a directory
+#for each build). The script extract_fasta.py will generate the file
+#all_fasta.loc. This file has the format (white space characters are
+#TAB characters):
+#
+#<unique_build_id> <dbkey> <display_name> <file_path>
+#
+#So, all_fasta.loc could look something like this:
+#
+#apiMel3 apiMel3 Honeybee (Apis mellifera): apiMel3 /path/to/genome/apiMel3/apiMel3.fa
+#hg19canon hg19 Human (Homo sapiens): hg19 Canonical /path/to/genome/hg19/hg19canon.fa
+#hg19full hg19 Human (Homo sapiens): hg19 Full /path/to/genome/hg19/hg19full.fa
+#
+#Your all_fasta.loc file should contain an entry for each individual
+#fasta file. So there will be multiple fasta files for each build,
+#such as with hg19 above.
+#
b
diff -r 000000000000 -r 2b970db61912 tool_data_table_conf.xml.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_data_table_conf.xml.sample Sun Jul 21 07:19:00 2024 +0000
b
@@ -0,0 +1,7 @@
+<tables>
+    <!-- Locations of all fasta files under genome directory -->
+    <table name="all_fasta" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/all_fasta.loc" />
+    </table>
+</tables>
b
diff -r 000000000000 -r 2b970db61912 tool_data_table_conf.xml.test
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_data_table_conf.xml.test Sun Jul 21 07:19:00 2024 +0000
b
@@ -0,0 +1,7 @@
+<tables>
+    <!-- Locations of all fasta files under genome directory -->
+    <table name="all_fasta" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="${__HERE__}/test-data/all_fasta.loc" />
+    </table>
+</tables>