# HG changeset patch # User rnateam # Date 1419257311 18000 # Node ID cd00b4fe6552705975947455a7390b43d3c472e6 Imported from capsule None diff -r 000000000000 -r cd00b4fe6552 rnabob.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/rnabob.xml Mon Dec 22 09:08:31 2014 -0500 @@ -0,0 +1,218 @@ + + Fast Pattern searching for RNA secondary structures + + rnabob + + echo "2.2.1" + + $stdout +]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What RNABOB does** + +RNABOB allows searching a sequence database for RNA structural motifs. +The probe motif is specified in a *descriptor* file, +which describes its primary sequence, secondary structure, and tertiary constraints. +The source in its original packaging can be found at http://selab.janelia.org/software/#rnabob. + +----- + +**Sequence database format** + +RNABOB is currently restricted to reading sequence files in FASTA format. +The command line version of RNABOB can also read sequence files in GCG, EMBL, GenBank and other formats. + +----- + +**Descriptor file syntax** + +The descriptor file syntax is fairly powerful, and allows a great deal of freedom for specifying +RNA motifs. The syntax is therefore a bit complicated. + +The descriptor file has two parts: a **topology** description and an **explicit** description. + +The first non-blank, non-comment line of the file is the topology description. It defines the +order of occurrence of a series of single-stranded, double-stranded and related elements. Each +element must be given a unique name (a number, typically) and must be prefixed with '**s**', +'**h**', or '**r**', indicating single-strand, helical, or a relational element. Helical and +relational elements are paired to other elements, which are suffixed by a prime, **\'**. + +For example:: + + \ + h1 s1 h1' + +describes a hairpin loop structure with a simple helix and single-stranded loop. If the helix +always contained a non-canonical base pair at one position, the topology coud be described as:: + + \ + h1 r1 h2 s1 h2' r1' h1' + +where r1,r1' indicate a correlation, where the sequence r1 constrains the sequence of r1'. +(Helices are a special case of this.) + +The remaining non-comment, non-blank lines are explicit descriptions of each element in turn. Each +line contains 3 or 4 fields, separated by tabs or blank space. The first field is the name of the +element, from the topology description. The second field is the number of mismatches allowed in +this element. The third field is the primary sequence constraint to apply to this element. + +Helices and relational element pairs are specified on a single line rather than two. Mismatches +and primary sequence constraints are given as pairs, separated by a colon '**:**'. The left side +is the constraint applied to the upstream element, and the right side is applied to the downstream +elements. + +The primary sequence constraint is given as a sequence of nucleotides. Any IUPAC single-letter +code is recognized, including N if the position can have any base identity. Allowed length +variations are specified with asterisks ``'*'``, where each ``*`` will allow either 0 or 1 N at +that position. + +For example:: + + \ + GGAGG******NNNAUG + +specifies a GGAGG Shine/Dalgarno site and an AUG initiation codon, separated by a spacer of 3 to 9 +nucleotides of any sequence. + +An alternative syntax can be used for very long gaps:: + + \ + GGAGG[10]NNNAUG is the same as GGAGG**********NNNAUG + +Be careful defining variable length helices and relational elements; if the number and type (gap +or identity) of position do not match on left and right sides, the program will refuse to accept +the descriptor. + +Relational elements have an additional field which specifies a "transformation matrix" of four +nucleotides, specifying the rule for making the ``r'`` pattern from the ``r`` sequence in order +``A-C-G-T``. For example, the transformation matrix for a simple helix is ``TGCA``; if you allow +``G-U`` pairs, it is ``TGYR``. RNABOB allows ``G-U`` pairing by default and uses the ``TGYR`` +matrix for helical elements. + +For example, the explicit description of our hairpin might be: + +:: + + \ + h1 0:0 NNN:NNN + r1 0:0 R:N GNAN + h2 0:0 **NC:GN** + s1 0 UUCG + +This describes a stem of 6 to 8 base pairs, in which the 4th pair from the bottom of the stem must +be a non-canonical GA pair. Note that, in general, the left side of the primary constraint for +helices and relational elements is redundant, and should be given as all N. In some cases it is +convenient to constrain the right side to require a particular base pair (GU, for instance) at one +position. + +A note on mismatches: The split format for helices and relational elements works like this. The +number on the left constrains the primary sequence match of the left side of the primary +constraint. The number on the right constrains the match of the right side of the primary +constraint, *after* that side has been constructed according to the sequence on the left. In other +words, the number on the left constrains the mismatches in primary sequence only, while the number +on the right will constrain the number of mispaired positions in the helix. + +Finally: any line that begins with a pound sign '#' is a comment line, and will not be interpreted +by the pattern compiler. + +**Options** + +The behavior of RNABOB can be modified by use of the following options: + +*Complement*: Selecting this option will cause RNABOB to search for the pattern also on the +complementary strands. + +*Skip*: This is a workaround to avoid a problem in the DNABANK. There are some sequences in the +database which have long stretches of ambiguous sequence (N's). Descriptors with no primary +sequence constraints will match these garbage sequences at many, many positions, and generate huge +outputs. This option toggles a search strategy that skips forward a pattern-length rather than a +single base when a match is found, thus printing out only a single match when overlapping matches +are found. + +**Examples** + +The following example descriptors included in the source distribution +(http://selab.janelia.org/software/rnabob/rnabob.tar.gz): + + - trna.des - a general descriptor of a tRNA structure + - r17.des - descriptor of the consensus binding site for the r17 phage coat protein + - pseudoknot.des - description of a simple pseudoknotted structure + +An example cosmid ``F22B7.fa`` from the *C. elegans* genome sequencing project is also provided +for running these descriptors against. + +:: + + \ + # trna.des + # + # Generalized descriptor of a tRNA cloverleaf. Doesn't + # find them all though. + # + + h1 s1 h2 s2 h2' s3 h3 s4 h3' s5 h4 s6 h4' h1' s8 + + h1 0:2 NNNNNNN:NNNNNNN + h2 0:1 *NNN:NNN* + h3 0:1 NNNNN:NNNNN + h4 0:1 NNNNN:NNNNN + s1 0 TN + s2 0 NNNN********** + s3 0 N + s4 0 NNNNNN* + s5 0 NN******************** + s6 0 TTC**** + s8 0 NCCA + +Running RNABOB with ``trna.des`` against ``F22B7.fa`` searches the top strand of the cosmid for +the above motif. ``trna.des`` hits twice, once on each strand. (F22B7 has several other tRNA genes +in it which the pattern fails to detect - this is *not* a pattern to use for tRNA genefinding!). + + + 10.1093/bioinformatics/6.4.325 + @UNPUBLISHED{rnabob, +author = {Eddy S.R}, +title = {RNABOB: a program to search for RNA secondary structure motifs in sequence databases}, +note = {}} + + diff -r 000000000000 -r cd00b4fe6552 test-data/F22B7.fa --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/F22B7.fa Mon Dec 22 09:08:31 2014 -0500 @@ -0,0 +1,806 @@ +>F22B7 921215/f22b7.seq +GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT +GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT +TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT +TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC +GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA +ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG +AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA +CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA +TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC +AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA +GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC +AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC +CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA +AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC +GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT +AACACATATTGACCATTTGGTTTGTTCAAATCAGAACAAATCTTAGCGAG +CATAAAGTTAGATGCGATTCCAGCAGAACATGTTAATCCCGTGAGTTGTT +CAACTCGAAATCGAATTTCTCGAACAGCCTCCTCTCGTCCAGTTCCGAAC +TCCACATGGTCGTAGTAGATTTTCCGCGATTTTTCGCATTTTGGACAGAT +CGATTCTTCGATTTTCAAGTCTTCCAAAGTATTTTCATTCTCGTCGAAAC +GGGGTAACCAACATGGACAATCTCCGCCGAATCTGTGACGCTTGAAGGTT +TCTAGTAAGCAAATAGTTTTTTGTTAATAATCAAATCTAAATCACTAACT +TTTTTCTGTATTACTTGCCACATAGTCTGTCAAATCTATAAATGCCTCAT +CCAATGACATCATTCCAACATCCGAATCGTATTCCATGAAAATTTGTGAA +AATTGGCGACTGACTTTAGTGTATTTAGGGTAATTTCCTGGAACAATCGT +TAGACTCGGACAAAGTTTATTTGAGATGAAGCCAGGCATTCCAGCACGGA +CTCCAAAACGGCGAGCCAAGTAGTTGGATGTGCTCTGAAAGAATAGATTT +AAAGCTTTTCCGAAATCGAAAATTTACTTTTCAAATTGAGTTAGGTGCTT +ACCAGCATTGCCGATGAGCCTACGGCCATAGGAACTGTTCTCAGTGCAGG +ATTATCTCTCATTTCAACTGCGGCAAAATAAGCATCCATATCTATACAAA +CACAGTCTCTTGATAAATCTCTAGATGATTCCAGTTTCATCTCAAGATTC +TCCATCTGAAAATTGGAAATTTGCTCTAAGTAAATTTATTAGCTTTTAAA +ATACATACCAGTATCTCAGATTTTTGTCTTTCTTCTCTGGTTGCTGTTTG +CAAACGATTTTTGATTTCTAAAACTTTCTCTTCAATTCTTGATTGTTGTT +TTTTGGAAAACGACGAGTAGGACGCTGATGTGTTTTCTTCGATGACTTTC +GTGATTTTCTCCTTATCCAGTCCATTCATTCCAGCTTTATTATCGTTGAA +AGTCAGCATTTTTCACGAAAAAGAGCCGGATTTTGAAATTCGCGGAGAAA +CTCCACTGATTGTGAGTGTGCAAATGCGCGTAATGGTGTACTTACGTACA +TCGGCAACACATTTGACGACATATCAGAAAGCGGCGCCAAATTAGAAGTT +GAGTGCGCGAGAAAAAACTACGCGTTAACCGCCAATTTTCACTTCCCCAC +AGATCTGTCTCGAGATTCTCGAGTCATTTTTCAAGTTTATTTGTTTGTCA +GCGGTTGTTTTATTGAAGATTTGTAAAATTTATAACAAAATGTGCAATAG +TCTATTAAACCTCGTGAGATATTTGAAGAAACTTTCCCCGTTTTAAATAT +TTCGTGTATTCGTGGAGATCGCGGGAATGTTTTGCCTGTTTCCGTAAAAT +TCCTCTATTTCTTTTATTTTTGCTTGCAATTTTCGATTCATTTCAGAAGT +TTCCACATTCGCAAAACGAATGGACGTCTCTTATTACGATGGTCCCAAGG +ATGAAGTCGCCGAAGCAATGCTGAAAAGCGCGGTGACGGCCATGAGATTG +GGACAATACGAGGATGGAAAAGGACGCTTAGAGGAGATAATGGAGTTCGG +AACCTCAAATTTTCAACTACTTGGTACAATCTACATGTATTACGGAAGAG +TGTGCAGGCATTTGAACCATGATGCCAAGGCCTTGGAGTTTTTCGAACAT +GAGTTGAACATGTTCAAGTAAGTGAATCACAAAAATGAGCTGGACATTCT +ATAACCTTAATTTTTCAGATTGATCTTCAACTACCCAGAAGCATGTGATT +CCACACGTCGCATCGTCGAGCAGGCACTCAAAATGGGAAAGTTCCCCAAG +GCTCGACGGTTTGCTGAGGATCTCATTGATTACACCAGCAATAAGAAGAA +CGGAGAGAAGTATATCGGTCAAGCTCGAATTTTGTTCGCTTCCGTGTGCC +TCGAAGGATGTGAAAGAGACGTCGAGAGTAATCAAGATGAGAAGAAGAAG +CTTTTGTCAATATGTGCTGAACAGATTGCAGCCGTGAAATTGTTCAACGA +GAATAATACGGAAGGAGCTGTGTCTGAGACCAAAATCATGTTACTTGAGG +CGAAATGCTTGTCACTAGACGAAAAATACGAGGAATCGCGTCGCAAGTAT +CAAGAATGCATCGATTTTGCCATCAAAACAGACCAGTTTGAAGCAGTTCA +CATCGCCTATTACGACAAGGCTCTATATGCTGAGACAGATCTTCTTTTCT +TTATTATCAGAGATCTCAGGTAATTTTTAGTTTTAACGATTAATAAAAAT +ATCAATTCTTTATTCACAGAAGTGCTCTCTTCTACGCCACGAAATTCGGA +AAAGAGCGAGATGTAGTCAAATATAAGTCGAAGCTATCCGAAGAGATGCT +GAGAAATGGCGAATTCCACGAAGCATATCTCTACGGATTGGAAGCGCTTG +TATCGATTCGGAAGCTTGGATTGAACGAATACATTGGAGATGTGTTGCTT +ACAATCGCAAAGTGCCTCATTGCACTTGGAAAAAGACGCCAAGCTGCTTA +TTTTATCATCTTGGGGAGTGTTCTGACCATCAACCAAAACAGTTTCAAAC +TGTTCTACGAGCAGATCGACGTGGCGATGAATCAAGAGAGAAGCGAAACG +GCAACTGATCAAGATGTATGCCTCGCAATTGATTCGTCTCCTGATCCGAC +ATCCTCGAATGACATGATTAATAAGTTCGTCGTCGAACTGGAGCACGCAA +CAAATGTGGAAACCTGGGAAATGATTGTCAACGGAATCATTGACGACCAG +AAGAAACCAGTGGCGATCGAAAAGAAAGAGAACGAGGAACCCGTAGACAT +GATGGATCTCATTTTCAGTATGAGCTCACGTATGGATGATCAAAGAACTG +AACTGCCTGCTGCCAGATTCATTCCGCGTAAGAAATGTTATAAATAAGCG +TATAAGTATGCAAATTTATATTTTTCCAGCTCGTCCAGTGTCATCGGCAT +CGAAAAAGACTACAAAGAGTCACAGAATCCTCCCTGGACTCCGTGCCAAT +TGGACAAAAGTGCAGTCGATGAAGTTCGATGGTCACACAATGAATAGGAT +CCTGAAGAGGTCGAAGAAAAGCAAATCGTCATTGGATTCTACAAATTCGA +TGCAGGGCGATGATACTCGAAGCGATGATGTGACAATGACGTCCAAATAG +GACCATTATTTTTTCTGTCAAATAATACAATCAAACTTTCTTTATTTATT +TTTTTTTTACTTTCTTTCAGTAAATATTATTATCATTTTAGTGGTTCTTT +TATTTTATTTGCTGGTCAGAAAAGCTGATTTTTTCAATTAGCGAAAATCC +ATCAAGTCATATTTCTATAGACTCTTTACTACATACGTTGATGACTTTCG +TGATTTTCTCCTTATCCGGTTCATCCATTCCAGCTTTATTATTGTTGAAA +GTAAATAAGCATTGTTTCCCGAAAAAGAGCCGGATTTTGAAATTCGCGGA +GAAAAAGTTGAAAATTGAAAAATCCAAACGATGCTCCGATGTTCCGTCCG +AAATGTGTGTTTTACCGTGACGGTGTTTGCAGACAGCTTGAAAAAACATT +TATTTTTTTATTTTAATTTTATTAATTTATTATTTATTTTAGAATGTTCT +ATATTTTAAAATGTGAATTTGTTTCAGGGTACTCGGAATGTTTGTCTTAA +ACCGTTCAAGCGGGCTCATTCATCGATCTGTACCTTTATTAGCTCAAGTA +TCCACGCCTACGACTTCCACAACAAAATTAGGTTAATAATTCTCCATTTG +GTGATAAACCAATTCTTTTCTGCTTTTTTAAAACATTATGTTACAGCTCA +ACTTCACACAACGCATGCACTAAGCAAAGAAGATTATTATAAGACTTTGG +GTGTCGACAAAAAATCTGATGCAAAAGCAATCAAAAAGGCTTATTTCCAG +GTAATATAAGTTTTTATCGAATACTTTGTAAGTATAAATACGTTATTTCA +GCTTGCCAAGAAATACCATCCAGATGTAAACAAAACAAAAGAAGCGCAGA +CGAAATTTCAAGAGATTTCTGAAGCATATGAGGTATTTTCAACAAACAAT +AGAGCAGGCTCGAATCAAAAATATTAAGGTACTTTCCGATGACACAAAAC +GTCAAGAATATGATGCATACGGAAGCGGAGGTGGCCCAGCTGGTGGAAGA +GGTGGTGCTGGAGGTTTCCACCACCATGGAAATGTTGATGTTAACGAAAT +TTTCAGAAGAGCATTTGGTGGAGGGGGTGGAATGGTGAGTTCTCCTTATG +AATTCTTCTGAATATTATATTAATTATATAATTTTTATTGTAATTAATAA +AAACTACAGTTTTATTTATTTTTTCGCTGATTCCAGGGTGGCTTTAATTT +TGATAATTTTGCCCAAAGTGCTTTCGGACATTCTGCTGCTCAGGAAATGG +TTATGGATATTTCGTTCGAAGAAGCTGTCCGAGGAGCCACCAAAAATGTT +TCTGTAAACGTAGTTGAAGATTGTCTGAAATGTCACGGAACTCAAGTTGA +ACCAGGTCACAAAAAGACGTCGTGTCCGTATTGTAACGGAACTGGAGCAG +TTTCTCAACGTCTTCAGGGTGGTTTCTTCTATCAAACAACTTGTAATCGA +TGCAGAGGAAGTGGACATTATAATAAGGTAGAGTTATTGATTTTTCTTTT +ATTGTAGCTTTTAATTTTTTCTTCAGAATCCTTGTCAAGAATGTGAAGGT +GAAGGTCAAACCGTTCAACGACGTCAAGTATCATTCAATGTGCCAGCTGG +AACTAATAATGGAGATAGTTTGAAGTTCCAAGTGGGGAAAAATCAATTAT +TTGTTCGTTTCAACGTTGCACCATCTTTGAAATTCCGACGTGAGAAAGAT +GATATTCACTGTGACGTAGATATTTCTCTGGCTCAAGCTGTTCTTGGTGG +TACTGTAAAGGTTCCTGGAATTAATGGAGATACATATGTTCATATTCCGG +CAGGAACTGGCAGTCACACTAAAATGAGGTAATAAATCTCCAAAAATTGG +AACTTAAATATTCATTAAACAATTTTCAGATTAACAGGAAAAGGAGTAAA +ACGATTGCATTCTTACGGAAATGGAGATCAATATATGCATATTAAAGTAA +CGGTTCCGAAATATTTGACAGCCGAACAGAAACAAATTATGTTGGCTTGG +GCTGCGACGGAACAGCTGAAAGATGGAACCATCAAAGGATTGGAGAAAAA +TCAGAAAACCGAGGAGAAGGAGACGAAGAAAAATGAGGAAAAGAAGTCTG +AAGGTGAAAAAATATGATTTCAATTTGAAACAAAGGTTTTATCAAAAAGT +TGTCTTTAAAAAATCGAGAAATTACCAAAAAAAATTCATTAATTTTTTTT +TTTTAAACCGACATTATTGGTTGTAAAGATTCAAAAAGTTTGTAAAATTT +TAAAGTTGTTTGAAAAATACCTGAAAGTCTTGTTTTTTTTTTTGCATTCT +GATGCACTTGTAATCTCAATTTTTCCCCAAAACTGATTTTTGATTCTTTT +CAATCCAAAATTACTAATTTATAGGTGCATCAGAATCACAAAAACGGAGA +AGTGAGCCAGTAGCTGAGAATGCAGAAACTATTGACGAAAATCAAGAAAA +CGAGGGATTTTTCGAAAAAATTAAACGAAAAATTTTCGGGTAAATATATA +AATTTCTAAACAAAAATTAAATATTTTTGTTTCATTTGTATTAATTGAAA +TTTTCCAGATAAGTCCGAGGAGAAGTCCGAGTCAAAAGAAGAGCCAAAAA +ACGAAGAATCTAGTGAAACCCCCGAGAAAAAAGCTGCAGAATCCTAATAT +TTGTCGCAATTTATCGATTACTGTCCCAATCTTTCCAGTTTTGTTCCCTT +GAAATTTGTTTTGTCTCATTTGTAGTTCTATTAGAGTGATTATAGTGTCT +TTTTTGTTACACTTGTTTTTTTTTTCTATTTTCAATTAAACCGAAACAAA +TTTATCAAATTTTATTCAGAAAGAGATAACACAACTAGCAATGATATGAA +GAGTTTTATGGATGAAACTGGATTTTTTTAGAAAATGGTCTGTAACTTGG +CTGAACATGAAAATATCAAAAACCTCAAAATTGATAAAATGTGGTAAAAT +GTTTTTCAGTTTTACTACACAACAAAGATTTTTTTATGAAATTAGATGGA +TGTTGAGTTATGACTGGTAGAACAGAAAAGTTTTCAAAAATTTTTGTGGA +TTTTGGTAAAATGGCTTTGACTCAATGTAACGCAACGAATAATTGGTATT +TAAAAATGACAAAGATTTTAAAAGAAAGAAAAAACATTTCTGATCATTAT +GTCAGTTGAACATTTGTTTCCTAGGCATTCATGTTTCAAGATACGCCAGT +TTGAAGATACGCAGTTTAGTGAGTTTAGGAAGTTATATTCTCATTACAAG +CACAGGTACATGAGAAAACAGCAGTCAAAGAAGGGCGGTATTTTTTAATT +TGGAATTATATTTTGCTCCTTATAATATTGGGAACACATCTGTGCTGCTA +AATTATTGTTTTGTTAATTTAAACCGAAACAAATTAATCAAAATTTATTG +TAAAATAAAAATACAACTAGCAAAATATTTTATAGGACAAAATATGAAAA +ACGGAATGTCGTGTGAAGATGGAATTGAAAAGATATGAAATCATTGTGAT +CTGTTCAGGCAAACTTTATAAAAAGCAAGGTTTTCGGGATAGTTTGAAAG +AGCAGATCACCTGGATGTGAAAATACGGTGACATTTTTTTGGAAAAGGGA +AAGAAAATGAAAAAAAATCGATAATTTATAGAAATTTTAGCCCGTTTTTA +ATTAGGGAAAGGGCGGTGAGTTAAGGATGTGAAATAATTGAAATTTCAAA +ATGGAAAAGTGGTTTTTTTTGCACAGAATTTTTCATTTTTCATCTTGAAG +GTTATATTAAACACGCGATGACTGTATTTAGAACCCATTAAAATAATGGA +GAAAAATGTCTCAAAACTGAAATACTGATTTGAGAAAACCATTTTCCCTA +TTTGAAAACTAAAAAAGATGACAAAATATAAAAATGGAAAATCTATTTGA +ATGCACGATTGAGAGAATATCTTGAATGACAAAATCTACAACTTTGCATA +TCAATTGCAGACGATGACGTGGAAATGGTAGCAGCATCGGCAGTTTGATC +AATATATCGAATACATCGGATATCAGTTGGTATGAATGGTATGATGTGTT +TGGAAACATAGAGATTCTCCACTATAAAAGCTTCTCGGGACATGTTACGA +GCTCGCTTTTGCATTAAATTTGCGTAGAGTTCTGATACAAGGACTACCTG +GAAATATTTCAATTTTCCGTTGATCAACATAAAAAAGCAAAAAAAAATGC +TTTTCAAAAAAGGCAAAAGTTTTTTAAAAGTTTCAACTATATGCATATTT +TTGACCACAAAACCAAAACAAAACAAACTCAATAATATAAGTAAAGAAAA +CATTATGGAAATCATTTTCTCGCAAGGTCTTGTAAAACTAAATCGCACTT +GAAACTGACTATTTATATATATGAGAAAAATGAAGGAGTCCTCCAAAAAT +GATGCAACTCAAAAACGAAACATCAAAACGTATTTTTCAACATTTCGCAT +TTCGCATTAGAGACATTTCGCATAGAACGGATTTATGTGGAAATTTATGG +ACGTTTTCGTCCTTTTTAGATACATGAGCACATGTTCTCCTGTTTTATTT +AGAGAGATGACTTGCTTTCAAAATTAAAAAAAAAAACTGCTTATACATAC +CTTTCCTCCTACAACCGCCAATGCAGATCTAGCGTCTTGAATTTTTCTTC +CGAAATAATGAATCTTTCGAATATACTGTACTCCTACCAAATCAATGCAC +ATTGTAGTTATTGCAAGACCTAAAAATATTTTGAATTTTTGTTTTTTTTT +CATTGAAAATTTACCTAAAATGATATAGAGCAATATGATATACATGTATC +CGTCCCTTCTGGGCATCAAGTCGCCAAACCCGACCTAAAAATATTAATTT +CAAAGAGTTCAACAAACTGGAACAAATTACAGTAGTCATTGTAATGAAGG +ACCAGTAGAATGAAGTGAAGAAAGACCACGGCTCTAATTTTGACATTAGG +ACACCGCCAAACGCTGTATATACTATCAGAATAGCTAATACCAGGAATGC +AGGAATTCTGGAAAGACAAAAATTGAGATGAGTGAATTAAGTTTTGAAAG +GTAATTTTCAAAATCCATTAAAAATGTATTAAAAATATTTTACAGTTTTC +CAACGACCAAAAAATCTGTCGAACATATACAAAAAACCAAAGCCAGCTTA +CAATGGAACCCCTAGTTTTTGTGATTGATCGATTGTTTCATAAATGTGTT +TTATAACATTACGAACAAAAAACGCTTCTTGACTGGCAAAAACGTTTACC +TTTTCTCCTCGATATTCATATCATGCCCCATTCCATGACTGTGACAGTGC +TCACAAACGTGCTCTCTCCGTTCTTTTCGATGTCGTGACAATATGAGATA +TTTTAATTTCAAATAGTTTCCATACAACCAAACAAGATGTTCAGATAGGA +ATTTACCTGAAAATTATTAAAATTATAATACATTTTCTACATTTTCCCTT +TAAAAACAAATTTTAGATGCGATTAGCTGATCGCAGTAGCTATCAGATGT +AACTGATGTTTTTTTAAAAACTTCACGTACTACAGATTTTCTAGTAAAAG +CATATAAGTGCCTGCCCACTTACCCAAGTCAGCGATGGTAACCAGTGTTA +GAGGTATTCCAAGCAAGGAGAACAATATACACCATATCCGTCCAATGTTT +GTCACTGGAACTGGATTACCGTATCCGATGGTAGTGACGACGGTTACGGC +AAAGAAAATGGACGATGAAAATGTCCATGTCTCCGTTGCTGCGTTCTTTT +TTACCTGAAAATAAATTAAATTATTTTTGAGGTATTTTGAGGTAGTATTA +TTGTATTAAAAGTTTGAATAATTTTGGTTGAATTTGGAATAGGTTGAAAT +CAAACTTTTCAAGTAAAAAACAAAGTTTCAATAGAAACAGTTTGAGAGAA +ATGATAAATAGCTGACCGGAAATGAGTACGAAAATCTACTTTTTAAACTT +TGAGTATGCTCTGAAATATTTTCCTGTTATTTCCTTTTTCCATATTCTGG +CTTTCAATTATCTTTAAAAAATTTTAAAGACACATCTTGGTTAAGTGTTT +GGTGAAACTATAAATCTAAAACTATTTTTAAAGGAAAATCCTTGTCCCTT +TGGATTAAAAAGAACAAGAATAATTTTGAAATGTTTAATTACAGACACAG +ACAATTGCTCAGTTAGCGCATAAAAAAATAAGCAGACAATGAACCAACAG +AACAAAAAACTTGAAAGGGTCTCTCTATCTGACAAGTTAGAGTTTTGTCA +TCTCCCGCGGGAACAGCTGAGCTTTCTCCACTTGTTTTTGTCTAATCTCT +CCACATTTTTCATGATCTATTGTTGTTTCTATCTTTTGATAGTCACAAGG +GACCTCTTGTGTTTTCCGTTTCTATTTCACACCGCCAAACAGACTCAAAG +AATGGAAGGTTTTGTGGAAAAATAGGAAAAAGATGAATTGGTGAGGATTA +TCAAGGATGGGGGAAAAGAGGTCGATAAGGAAATAGTTGGGTTAGAGAAG +AACAAATAATAGTCCGGGTAGTTTGGATGGTTCAGATGTTACTTCTATAT +TGATAAATTATAAGTTGTGTCGATGCAAAATGTTAATTTAAATAGCCGAT +CGTAGTTTGAAGCAATCAGTTTACTGGTCAATTTCCGGCACATTTCGGCC +AATTTTTGCCAAAATATTATATAAGTTTAGTTAGAAAGTTTTCAAAGCTG +GAAATAAATTCGGCAAGACTTGGCCTAGATTTCCAAGTTTAGCTAAGCTT +TGATTAAACTTCTGTCAATTTTTCATGAAAACTGTTCGTATTTTTTTATA +GGGTCGAAATCGGGATGCCTGGAGTTGTGCCCGGATTGAGAAAAAAATTT +GCAAATAGCCATCTCAATTACTCAATTACAAATTGCTTGCAATACATTTT +CCGTACCTTCTGGTCGTCACATGCAATCTTTACAACAAAAAGTCCTGTCA +AACTCACCTCGTTACTCGTCAAGAAATACTTCTCAAATGCGACAAACAGC +TGATCGGACATATTGTGCATGTGCCGTTCGGCTAGTGACTCCCATTCGTA +TCGTTCTGCAGACAATTGTGTTCAGGAAATTTTGTGGATTTCCATTTTTT +GCCAGTTTGGTCGAATTATGTGTTGTAGGTATTGTCAAACAAGTTTCGTA +AATTTGGCAATGTGCCAAAACTTTCAAAAAAAAAAAACAACGCTTTTAAA +GTGTTTTAAAATACTTGGTCAATCTGAATTAATATAATTGCATGTAAATC +CCCACTACTGATATTAAAAAGTTATGCCAATCTAACTTACTTGTCTCATT +TCCAGCAGCCAAACGAATTAAGTCGTCAACAAACTCATTTTGACGAGTGT +AGATCAGTTTAAGCTGAAAACTATTATTATTTTGAAAATTTCATATAATA +TTTCGAAATATTCATAGAAAGAAAACAATTGTGTCCACAAAAATGCTGTT +GTAAATATATAATTAGGTGCACTCATGTGCTGCCAGGTTGTTCTTTGTTG +AAAAATTCGATTTCAGGATGCCATAAAGTAAATGGGAAAATATAATGGAA +TCTAACGCGACATTGTTTAAAATACTTTGGCCGCTTGTGACGCCACACTG +TTAAGTTTTAAAAATATATAAATTGATCCAAAAAAATTGAAACTAACTAT +AAGCTTGACGAATAAAATCTAAGAGCCTAATATAAAAAAGAAGCTTTTTC +TCATGCTAGGTACTAAAAAAGGCGCCCGAAAAGCACCTGGCCACCGCAGC +CTTTTTTCCAGAAAATCGGAAGTCGTTCGCAAAGAAAATGGAAATGCTTG +AGCCAACTGAGCCCATCTGCCACGTCAATGAAATAGGCAGATGGCTGAAT +CGAACGGAAAAGAAGAAGGATGTGACGAGTGAATGAATGGCTTTTTGCGT +TGAAAAAGTACCACTTTTGGTAGAATAAGATGGGCGAGTGGGAGTGAAGG +AGTAGAAATAATAGAAATTGAAAATTGGAAACTTTTTCATAATTTTAACT +AGATTAATTTTAGATGTCAGAGATAAATCAAAATGTGCTTCAAAATAAAA +TAGTCTGAAATAGTCTTAATCCTTTTTATGCAAAATAAAAATTCGAACTT +CCCCGAAATTCAAGTGGCATAATTGTTGCAATTTACACAGTAGTTTTGTG +ATTTTTGCGCCAAAAGACAAATTATTATCAAGTGTGAAAAAAGTGTGCGC +CTTTGAAGAGTACTGTAGTTCTAAACTCTTGTTGCTGCAGGGTTTTTCAA +AGTTTTTGTCATTTTTTTAATGTTTATCTTTATATTTTTAATTCATACAT +GTATTTTAAAAATATATTTTTAACAAAAACTGTGAAAAATCTAAATAGAT +TTCGCAGCAATGAGAGTTTGCAGTTACAGTTATTATATATCTTTAAAGGC +ACACACCTTTTTGAAATTAACAAACAATATCGAGTCGAGACCGCGTACCG +TATTAATATCGCAAAACTTAGAGTTTTATTTGTTTTTATCTTGATGAAAG +GCTTTTCAGGAGAACGTTTTTTTTTCAAGAAGAATTATCACCTCTCTCAC +GAATTATCAACTCGTTCTCATTTTATTTCATTTTTAAGTGTACATACACA +TCGCTGTTAGTTACCATCGCACGGATGGCCGAGTGGTCTAAGGCGCCAGA +CTCAAGCGAAATGCTTGCCTCATGCTCGAGGTCGACTGGGTGTTCTGGTA +CTCGTATGGGTGCGTGGGTTCGAATCCCACTTCGTGCAGAATTTTTTATG +TTTTCAACCTTGACCAAAAGTCATATTCATCCAATTCGGTACCCTATTTC +TACGTTATCTTGTTCCTCAAACGATGGCCTTATGTTTTGATCTTCAACAC +TGCATAAGGAAATGTATACCAAAGACGAAGGTCGTTTTGATTTTTCAGTA +ACTGGCAGCAATGATCCTTAAATCAGACTCCCACGCCCCTTCCGTTGAAA +ATAGGGCTTCGATCGCCAAACTAATCTTATAAACACTAGAAAAGTTTTTG +TTAAAAATCCGTATAAAAAGGGAATAATGTGATGAAAAAAGGCATCAAAT +TTCAAATTTGAACATTTCAAGCTTAACTCGAACTTCAGCTTTCTCGCGAA +AAAATGAGTCAAATCTTCCAAATTCTTGTGATCTTCGCCATTTTGTCAGC +ATTGCAAGTGAATGGGTAGGTGAAGTACAGTACTTTATGGCATCAAATGA +AATATACTCAGATTCCTCTTTCCAACTTATTCGTCAGGATATGACTACGA +TTGTTATGGATACGGTAACAATGGTTACGGTAATGGTGGATACGGCTATG +GAAATGGTGGCGGATACTATGGAGGATACAATGGATACTAAATAAAATAA +ATGAATACTTTTTGACGGATTTTTCATTGAAAATTTATTTAATAAATGTT +AAGCATTCATAAGTCTTGATGTAGCCTACCCTGAAGTATGATTTCGGACG +GAATTATTAGGTGTAGGTCGTTTCCTACGTTTGCCTTATATGCAGGCAGG +CACGCCTTCGCGGAGATCAGGGGTGTATTTTTGCTATCGGGCAATCAAAT +GAAATATAGATTTGAAAAATTTTGTATTTTTTTTTTCTCTAAATGTTTTC +AATTTTCAAGCTCAAAACATTTCAAGCCTAGAAGGAATTGTTTCAAATTT +CTTCTCTCAAGTTGTGGAACTATTTCCAATATAATATATTTATTCCAAAT +TACTCCCATGTGAACTATAGAACACGTATTGTCTCATCATTTCACTTTTA +TAGTTTCTGTATTACAACACTTGAATGAATGCTTGGTGTAAACAATTTAT +CAATGTGATAAACAATTCGATTTGACGAATTTTTCCGAATTCTCAGGATG +ACCGAGTTTCGATGGTCAGTGGGAGTAAATGAATTGGGCAAATTGGTAAT +GGAAATCTAGTTTTTATGGTGAGTCATGGCAAAGAAATCAAACATTTAAA +AATTACAAGTGATAAACCATGAGTAAACTTAATCAGGCAAGCTGGCGTTA +TCAAGTTGTTCAGTTGTTATAATAAAAATTACAAAAAAGATTTTCAAACA +ATGTAGATTTTTTCCACTATCAATATTCGGTAACTCAAAATAAGTCTCCG +TAGCATAGGACCTGGCAGCCTACACCTACATCTACATCTACATCTACACC +TACCCACACTACAAGTACACCTAAATTAATGGTTCAAATGAGTCAAACTA +ACCTGTTGTTCCTTCATCATTTGCTCGTGCGGCTGCTCAACACTATAAAA +TATCAATGCTCCAATTACTGTATATGTACACGTTAGAAGTACAAGTGCAA +CATGAGGTAGGACGAGTTTAGCAAACTGAAATAGGAAAATCACTTCAATT +AAAATATTAGAATAAATACTAAGTTTTCAATTTTTAATTAATTAATTTCA +ACGCGTTTGAATAATTTTGTTAAACTGTCGAGGAAAAAAATCTCATAGTC +AGCGATCGAATTCTCACGAGTTCTGCCAATTTTTCCTTTTTTTTTAGATT +CTTGTTTCCAATAGATTACCTTCCTGATTCCAGTCTCCTCTTCTTCTTCA +TCTTCATCTTTACTTTCATCATCACTTTCATCGTCGATATCTTCATGGCC +ACTCTGAAAAGTTGTATTAACTTTTTAAAAATTCAACCAAAAATAAAAAT +AAACAACTGTGTGAATCAAATGAAAAACTTTTAGTTAAAATTCTATAACA +TTGAAAATTGTATGAAACTAACAATAAAATCACACAGTTGACTATTTTCT +TCAACGATCCATTTTTGCTAATGATTATAAAATGGTTAGCATAAAAATCC +TTGTAAAATCAATAAAAACACACAAAATGGGTATATGGGAAAGGGGAATA +TAAATGAAAAATTCCATGCTTCTAATTCCAAAACCTACTTAATAAACTCA +CAATTATGGCTGAAATCAAGCTGATTTCTAGCTCAACTTCCTCATTTTCA +ATGAAAATTTTTGCAGATTTTTTCGATTTTTTTATTTTCCGATTTGCGTA +CGGACATTGATGAGGGTTATCTGGTGCTTCCTGTTTTACAAGAGTTTTAC +AAAATAGGGAAAAGAGCAAAAGGCACAGTAAATCTTGTAAATTTAGAATT +AATGAAAACAAACCTTATCACAACTGTGAATTAAATGACTATCTTCATAA +TATTCAGCTCTTCCAAGTCCAGCTCTTGTTACTTCTTCACCGAATAAAGC +TTCCTGAAAATAAAGGGAAAAGGGCATGCAAACGTTTTTGGAAAATCATG +CAGTTTTTATTTTTATTTTCTAAGATGCATTTTATTGTCCACGGTGTTAA +CACCTGGTAACTCTATCTATATATGTTTTTTTTTGAAATTTCTTATACCT +TTAGCTCTGGTATACCTAATTTTTTTAAATATTAATTTTAAAAATTACAT +GCATTACTAACAGAGATAATTATTCTGATCATTAAAATTTTGTGATTGCT +CTGGTCCTTGTAGATAAAAATAGACAAAATAATTACCCTCCTCAAAACTC +TCAACTCTTCATCTGTTGCCACGTCATCTCCGTAAATCAAGTTCTCAAAA +ATTTTAATATTATGATCAATTGAGCCATTATTATTTTCATAATGATCTGG +TTTTGTATAGCCACGTTTTTCATTTATAAAAAACTTTGCGATTCCAAAAA +ATTCCATTATGAAGAATTTGGAGACCGTGCTGAAAACATCCATCAAAATT +GAGTTAAAGAAAATCGAAATAATTAAATGAAATTGAGTTGAACGTAGGAT +ATTTATTAATTTGTTTTAAAATCAGTTTAAAACCATTTTTGACGAATCTC +AAAGCATTCAAAAATAAATAACAGAACATGGGAAAAATTGTATGAGTTTA +TATAATACGTTTTCTCATGGTACTATTGATTGATCAGTGACAGAGTTTTC +GATTCAAATTATTATGTCGCCTTGATCAATGAACTGAATTTTCTGAAATC +GTATGTCATACTATTTACTAAAACTTCACAAGTCATAAAAATCAAAATAT +CAAAAAAAAAGGATCTAGAGAACTTTCAATTTTTCTCCGTTTCAAAAGTG +TTTGCCAAAAACCGAAATGGGAGTACATTTCTGAAGATTCCTTTTTTCTT +TTTTGTTCAAAAAAAAGTTGTTAAGCTGGAAAAAAACTTTGGTTTGAAGA +GATGGAAGTTGTCTTGGACACACTCACATTTTTGATGGACAGCCTATCAG +CGAACCACTTTTGGATTCTTGGCGAGAAATTTTTGAATTTTCGAGTCAAT +TTTCATGTCTAACTTGAGTATATTTATGACAGATATCACATTTTTCGAAA +ACAACAAAACAGAAATAAAAATTTATGAAAAACAGATTTTTTTCCGGATT +TCAAAATTGTAAACTAGAATTTAGATTTTTTCTATTAGTAATGCATACAA +CTTTTTGGAAGGAAAAATACTAGAATTTTAGTAAAATTTGTTTTTTTTTA +ATTTTCTAATTTCTACAATTTTCCAAAACCATGCAAGAATCCATTCAATG +TCACCCAAAAAAGCAAAAGTCAAACAAGAAAAATCAATTAAAAAAAAGTT +GAACCCTTTCAATTTATACACTCACCCTTCTCTGCCGTCTATGAAATCCT +TCTGGAATTTCAGTGACTCGATCATCCATCTCATGCCGTTCAAGTTCTTC +ATAGCTGTCATATTCAACTCCTACGTGGTGGTATGTTGAAGCTGCAACAG +AAAAATTGAACATCAAGTTATTTCTTACAGCTTGAACCTTCAGCTCGCGC +AGATTAAAAATTGGTAACTCCGTTGCAAAAAAGATGGAGTTGCGTTTTTG +GGAAGTTTTCAAGATTTCTTACTAAGAAGTTTGGTACTTCGGAGTTATGT +TGAGTATAATAGGGCAACTACCCAAGTTTATTCTACCTTTAACAATCCTA +ACACTACTTACCGTAATCATCGTTATTACGATGAAGCAATGCTTCATGTG +GATTATCACCTTCCTCTTCCATAAGTAGACTGCCACCATCACCACTAGAG +TCAACTCGTTGATATCCACGAGATGATGAAGTCATCGCTGATGACGTGGC +ATCTGAAAACTCAATTAATTGATGTTAAATTAATTGTATTATAATATTAA +GTTCAACTTGAATTTTCTGAATGTATGGAAATTGAGCTTTTTTTTTAGCA +AATCGCCATTTTGAAAACTCTCATTTTCAAATAGGGTCTAAAAGAGACAA +AAAGGCAATACCTGTTCATAAGACACTGTTTGAAATTTTCAGTGAAAATT +TTGGCAATATGCCAAAATTGCAAAAAACTACAAGTTATGATATAAGATGC +TACGAGAAAAAATAAAAATGTAAATTTCAATCTACATCTTCTAAAAATTG +TAAGAACATCATGCAAAACCTGCAACAAAACCTTGAAAAATTGAAAGAAA +AAAGAAAGGTGAGGTAATCTGAGTCAAGGTGGAATGATGGAGGTTGGTGG +GAAGAGGAAGGAAATTCAAAACCGCATAGACACCAAATATTGCAATGTTT +GACAGAGGGCGACGAGGAGAGGAAAAAAACGAGATGAGCTCACGGGCGGA +AAAAAGTTTTTTTGGAAGCGCTAAACTGATTAGTCAGTGGCGGTGGAGGA +AGACCATTGGTATTGGATGGATAGCCCCCAATTTCGATGATTGACTTTTA +CACATATACACACCAAGCTTTTGTTTAGAAAAAAGAAACAAAGAATTGTG +TGTGTTGTAGGATGCCATTTTGAGGCGTGTGATTTGATAAATGAAGACTT +CAACACCACTACTCCCTTGAGTCCAGATGAATTGAAGAACACGAAAAAAC +GATTTATAAACAGATGGTTTCGTGTGGTTTCTGTGTTTTAAAAAGGCTGA +CCAAAAAACTTCTGTTTTACAGGACGATGTGAGGGCTCTTCCAAGCTCGT +TCTTTCCTCGCGCCTTACCGCAACTACCCATTAGTTTATCTGTGAGTTTA +TTGAAATACAGTAACCTAGCTTCACCGTATTTCCTCTAATAGTATTTCAC +CACACCACTTTACTTTAAAAGGTTATATCTTGGTGAACTTCAAAGATATG +AAAAATTATAAAGTATGAATCGACTAAAAAACGAATATTCTTCCAACTTG +AAATTTTTGAAAAGAACAAACGACAGCTGAGATATAAGTTGTTAAAGTTG +AACAATGGGGTCCAACACTAGTAGAGGATCATATTACTCTTTGTAAAACT +TTTTCGAATAGTGGAAAACTTAGTCGTAAATACACAAAATTAGACTCTAG +TGACTCATAAATTTCGATCAAAATCTTTGAAATGTAGAATTTTTTGCCAG +CAAGTTCTGTGAGCCACAACAATCAAAGAAGAAACCATGGAAAATGTTGA +ATCGGTCAAAGGAAATCGCGGAAAAAGTAGTGGAAAAGGGGCAAAAAGTG +AAGAAGAACTAGGCAAGATTAATGAGCCGCCTTTTTGGCTCTGAAGAGCC +ATGTTTCCCTTGTGAGCATCATTTCTTCCGCGGCGCAAAGAATACGGAAG +AATTGGAACAAACTTTTGATATTTGTTAAATATTTGCCCTTTTGGAAAAA +ATCGTTGTGGTTGTGAAGGAAAAAACTACAGACAAAACATTTGTAAAAGT +CGAAAATTGCTTGAAAATTTGCGAAATCGGTGAAACAATCAGAATTTCTG +GCTTTGCCACCAATACCTAACGAGACCCTGCGTTTAGGGGCGGAGCATTT +TATTAAGCCATGGAGCGCGATTACACCTTGGAGTGCTTTTCGAAAAAAAC +CAAATTCAAATAATTTTTTAAAAAATCGAGAAATTATAAGTTTTAATTTT +AACTTTGCTCTATTGATATGGGTTATCTAGAAGCTAGACTCTCTATAAAA +AATCGAATAAAAACTAACGCGAATAAATCTTGAAAAGGTCGGAGTGCTCC +CGAGAAAAGTTTTTGTGCTCTTTTCGTCGTAAGACTTCAAAAAAAGAGGC +ATTTGGTTTGAAAAATGGCTCTGATCTCTTCGGTTTTTGCCCTCTTTTTG +TTTAGAATATTTCATCTGGAACAATATTTGGAATCACGGAAAGAAATTTG +CAGATGGTATTGTTTATAACTTTTATATATTTATGGAAAATCACTCTTTT +GAATAGATTGTTCCACAAAACCTCACAAAATACTGAAAAAAGGAGAAATG +GAAGACAGATATACAACGATATAACTTTTTTTGAATATAGGTTAATAAAA +AATATTTGTGGGGGAATATTTGAAGATATTGGCATATTGAAAAAAAAAAA +GAAGAAAGCGGCTACCTTTGAGTTTTATGGACAATGGAAATGGGAAAAGA +GAAGGAATTTCATAGGTAGCTAGGAAAAGGTTCTAGTATATGGAACTACG +TTTTGGGTATTATAGAGTTCAATAATATGAAAATCAATTTAATGATGGCA +CAGTTTACACCCTATTCCCGGCAGTCGAACACAAACTTGTGCAAAAAATA +GTATGTCATTTTTGTTCTAATTTTTCCGAGCAGTCATGTCCGAGTGGTTA +AGGAGATTGACTAGAAATCAATTGGGCTCTGCCCGCGTAGGTTCGAATCC +TGCTGACTGCGTGTACTTTTTATTTTCTTTCAATTCATAGAGTAATGCAA +ATTCTATTAAAATCGGATACAGTATGTTATGAAAAAAAAGTGTTCTTCTC +GAAAGTATTTGGCCTCGCTTCTTCAGGCTGCTGGTTCTAGAGCAGCAGTC +AACTCAATTGTTTTCCACTTTTCCTTCTCATTTTCATTTTTTCATCGCTC +CTTTTTCGTTCCTCACACTCCTCTAATTTTCCCTTTTTTACTCATTTTTG +GAAAAAAAAAAGAAAAGGAGGTCTTATTTTGTAATCCTTCGGAAATTCGA +AAGCTTGAAGAAAAACTAATAATGATGGAACTGTGTGGATTATTGAACGA +GAAAAAGTTTTAAAGGTGCATACAGTAATTTGTGTGTGGTCTCGCCGCGA +TCTATGATATACCAGCAAATACGGGATTATTACTTCAATATTGATGGAGA +ATGTACATAGAGGGAACGGGGTTGGGTATCGCTGTCAGAAAAATACTGAA +AAACCAAAAAAAAAATCCAAAAAAATTGTTTTTCATGATTATAACCTAAA +CTTCATGAGTAAAAGTATCAAACTATCTCATTCTGAACAAGTTGAAAACC +TCGGGACCACCACACCAAGAAGTGAGAAGCGCCCTCTTTTTGGCCATTGT +TCCCGGGGGACAGTTTTGCGATTTGACATAAAAATCGCGAGAATAGTGTG +TGAGATGAAGAAACGCAGAGTAGTAGAAGTAGGGGTATACGATGTTAGGC +AAAGAAAAGGAGGGAGCTGGAGAAATGTGTCTGAAGAGCACTGACTGACA +AAAATGTGCAAAATGTCGGAAGGAGGATGGGTTTTCATGTTTTCGGTGTA +AAACTGACATAAGAAAAACATAAGAATATTGGTAGAAACTGGTAGGGGTT +GTTAGGTTTTTCAATCTTTGAGAAATTGACGAAAAGTTGAAGTCTTCCGA +ACTTTTTTGTGAAATTTAGGAGAAAAGATCAATTGAACGTCGGATTGAGA +ATTTAAAACTAGATAAGTTTCAACTCAAGCGACGTTTGAAAAGATATGAA +GTTTTAAATGAAAAATAAACGAACATTTTAAACAGGCAAACAATGCCTGC +CTACGTGCTTACAAGAAAATTGAAACTAACAAGTAGGGCTACCAAATCAG +ACGTACGATTTCACTGGATTCCGGACGATCCCGCCCCTTGTTTGTTTCCA +ATTTACAAGAATATAAATGGTTAGGTTTTCTACGTGCCTACACGTCAACT +TCATGACTGCCTGCCTACTTGCCTACTATATGGATACCGTGGCGAGGTCT +TCTGGTTTTGCAAGCGTGCAGGCAAGCAATAGTTTTTTTTCTGTATAAAC +ATTCTGGAAAATTTAACTGACCGTGGTTTATCAAATACTTCAATACTTCA +AAAATACAGTTTAAAACATAATTTTTCTTTTTTTTCAACTTCTAAACTAA +CCTGTTGTAGATTCATCAATAGCCGGAATATCTGATTCTATTGAAGGTTC +GAACGGCTTAGACGGCGTAGAAGCTGGAACTTTTCAGTCTATTTTCAGTT +TAAAAAACTTTAACTAACCTTTTTTGAAAAGTCGTAAATATTTTGCTCCA +GTTCCACGTGGCATGATAGCTGGGGAGCTTTTTCAATTTTGTTTCCGATT +TCAAAACAATTTTCAAAAAACACTGGTAAAGTGCACTTCCTGCTCAATAT +CAGCCAGACAGAAACCAGAAGAACACAACTATTTCTGTCTATTTTTCTGA +GAGAAAAAGAGCATTTTGAGCAATTTTGAGCCAGAGAGTCCCAAAACACA +TTTTAATTTGAAGGATCCTTGCTCCTGAGCCCCCAGTTTTACGGGTCAAG +TTAGGCATAATTGAATATGAGGAGGCGCAGAAAAAAAGAGAAGAAAATAA +ATGAAAATTAATTTTATGGAGCTAATTTTTGAAAAAGAAGCAGGAACTAA +TGTTTTAATTGTTCGTAGTTTGACTTGATGTGTGGTGAATAGTGTTTTAG +TGTTTCACTGATCCTTCAAAGTAGATGTTGCACCATGGTCGAGGGCGCTT +TTCAACCGCTTTGCTCTGTGGACCACAGTTATTTGAACTTTTCTAATTAT +TTTTTTTTGTTTTTTAGTTTAAATCAAAATAGTGTCACTATTTTACTTTT +TTTTATATGAACAAAACACTGGGAGTCGGGGTTCCAAATTTTATGAATTC +ATATCACATCACAGCGATTTTTGTATTTTCAAAAAAAAACAAAATTTCAG +ATCATTCTCCGGAGGTGGACAGTAAATGTTCTCAAATCAATAAAATTTAA +TTCGATTTGTGATCCCAACCATCAATATGCATAAATATTTCCCAGATCAT +TCGAAGAAATCATTCGAAAATGCGTCTTCTTATTCTCGCCCTCGCCTCTT +TTTCCGTTCTTGTGAGCTCTTTCATCGTAAGTTCTTCTTTTGACTAAAAC +TGTGTGAAAACCGTGGAAATACGTGTTTATCAGAGTTAGTGACGAAATTT +TACCTTTTAATGGTATGATGCCTCGTAAATTTAATCAAACCTCTGTAGAA +AACTCGTGTCATGTCAAGCGATTCGAAATCGATTTCTGAATGTGAAACGG +AAACCCGTGTCACTTTGATGTCCAAACGCAGGAATAATTCCTTTGAAATT +CAGATTCGTACTCATGATCTGCAAAATGCACCAAAAGCAAGAAGATTCCT +GAAAAGTTTTGATCAACCTGAAGTAGTTGATCTTACTAAGGATCCCACTA +CGTTCCAGGAGTTTACCAAATTCTATGTCCAACTTCTAATCGACGTAATC +CAAAAAAAAGACACCAGAAGCCTTCATGGCACACAATAATTACCTGGATC +TTGAAATGTTGTTCAATGAATGTAATGCTCAACATTACGAAAATTTAGGT +TGGTTTCTGTAATAATATTTCAAAGTCTTTTTTAATAAAATTTTCAGGAA +TGTTCTTCAAATGGCTTCAACAATTTGCTGCTCATCACGAAGATCCAAAG +ATCACAATTAAAGATTTCAAGGGAGATAAAACATCAGGAGAAATGATTCT +TCATGTTAATACCTTCACTTTATTCAAGTACAGACAAGAATTTGACATGA +AAGTTAGCGCTTTGAATGTGGGTTATTGAATCAATATTTTGAAGACAAAA +CTAATTTTTCAGGACAATGGACTTGGGTGGAAGATTTTGTTTGTTGACCG +CAGTCTTGGGTGTAATTGATTCATTTTTTGAAACCGAATAAACTAATTTG +TTAAATTATTAAATTAGGCAATGGGACAGTCTGGAAGTTGCTAGAATCTA +CCGATATTTATTAGATATTTATTACTTACTTGCTACTATGTTTCACAAAA +CAATATCAGAGAAAAAAAATCAAAACTGAGAAAAAACGAAACAACTTACC +GAACAATAGTTAGCATTGCTTTGTACAAAAAAAACTGTGAGGCTATTGGA +GCACTCCTCTGATTGATCCGGTCGTGAGTATTGATTTTGTTTGTCAATTT +GTCTAGGACGCAATTGATTCATCGTTTGAAAAAAAAGCTCGTTCATTCAA +AATCATTGATTATTTATTCCATGCAGAAAACTGTTTTATAATTCTCGATA +AAAATGTTATATTTTAAACTGTAAATATTTATTTTTCCACATGATATGGT +CTAACCGTGGGCGTAATATTTTCGGTTTAGAACGGCTGAATCGAATCTTT +TGGACTGCAAAGACGGATTGGATTTTGAAACAAACTACGGTGTTTTTGTG +ACAGTATTAGTCAACGAATGGTAATTAGGCATATGTTTTTGGAACAACTC +ATAATTTCTTGTTGGAAAACAAAAGTGGCAAAGTATAATTATTTAAAACT +CTGACAAATCTTTTGTCTTTTTTCTCTGGTACTCCTGTTTTTAACAGTGT +CCTTATTATGCTCCTATTACTCTTGTGTTGTGCACGTTCTTTTGAAATCT +ATATTGCAGAGCGGGAATGAATTTAAATAAGAATAATTCAACTTGGTCAT +TCAAACAATAAAACGCCTACCATTAATATATCCAAAATAAGTTCTTTATT +TTTGCAAGGTATTCAAATATATCAATAACTCGAGCCTGAAGAAATCCACC +GCTAGAACATAAAATTAACGATAAATATTAAGGCAAAATGTTTGGGCCGA +ACCGGGAATCGAACCCGGGACCTCTCGCACCCGAAGCGAGAATCATACCT +CTAGACCATCCGGCCACTGCCCGGCTGGAGAAAAATGGGGGACTTTGAGC +GTTACAACATTCTCGAGGCCCTCTCGAGTCGAGGGGGACCGAGACGGAGA +GTGCTAGGAAAACCAAGCATATTTGGTTTTTGAGTATATAGTCGTACGAT +TGAGAGGAAGCAAAGATAGATTTATAGCTTTTAATTTATTTTTTCTAGGA +TTTTACGTAACATTAATTTCAATACTGATTTCGAGATTTGTCTTTTGATA +GATTTACAACATGGTTATCATCCGGTATACTATTGTATAAAAGATGGTCC +TAAATTCTATGAGATTATTTTATATTCTCTCTCAACGAGATGCTTCTTCC +AAAAATTTCAATTCTTCTATATATTCTCGTCGTTCTTCAAGAAACAGCGG +CTGTCCCGGTGCTCTTTTTCGCTCCGGAAGAGCTGTTCCGTTTGAGCGTG +TCGTTGGACAACAGGTGAATTTTTTATTTCATTTTGAGTGAGCCATTGTA +ACATTGGATCATAAGAGTCACGTGGAAAGTAGAACAATCTCAACCCACAA +GTTAAAGCCTAATATTATTCCAATGCCTCTAGTAAGATTTTTATAAAAAT +AGAAACCTCGCCATATTAGCCAGACACGAGATTTTTACGATTTCTTCGTC +AAATATACGGTACCCAATCTCGACACGTCAATTTTTCAATAAATGCAAAA +AGATGTGCGCCTTCAATGAGTACTGTAACTTCAAACTTTTGTTGCTGCCG +AATCGACTTAGTTTTGTGAAAATATATGTATTCATGTTTTAAACAACTCA +GAATTAACCCAAAAATTTTAACAAAAACTTTTTTTTAAGCTATGAAAAAT +CAATTTGAATTCGGCAGCAACAAATGTTTTAAATTACAGTATTTTTTAAA +AGCTTACGCCATTTTGCATTCATTGAACATTTGTCGTGTCGAGGCCGGGT +ATATTTTTGACAGAAAAAAAACAAAATTTCAGGTTCGTTGTTTTTTGTTG +TCAAAATTATAATGTGTTTTAATATCATATCAAAAAATTAGTCATTTGAT +TTTCGATAGATCAAGCTTCAGTAGAGGTTTCTAATCATTCGGAAATATAC +ACAAAATCTTGAATTTACACTTCCGATCAGTTGGAGAATATCATTGATCT +ATCAATAATGTTAGATTTTGTAGTTGATATTTTCACAAAACTCTCAAAAC +TTTTTAACATAAAAAGTTGGCATGGTGCGTCTTATAATTTTTGAATAACT +TATTAAATTTCTAAAATTAATATAGAGATTGGCTCCCACATGTCGAGAAG +TTGAATTATTTGTTCCTTGTCTATTCACTTCTAACCGTGATTGTGTTATC +GACAAAATATGTGTTCATGAGAAACCACTTCCCCTTTCATCGATTTCTGT +GTCAAGCCCGAGTCCAAATACAAAACGGGAAGGTCATCCCGAGTTCATGA +GCCAAATTGCGCCGCCTAGAAGAAATGTGAGAGTTGAGCACAGCATTCAA +CAATTTCTAATCAGCTTTCAGGACTTTTTGCGATTCGGAAGAGCCGGAAT +GGCTTCTGGAGTTGGTGGAGGATCGGAAGGAGGACCTGATGATGTGAAGA +ATTCGTATATTCGGGTCAATGGGGAGCCAGAGATTGTTTATCAATAAAAT +AATATGGATTCGATGCGTCCGATTATTTTTTTTTACTTGATATTACATTC +TCGAAACTATTGTAATGTGTGGGGAATCGTTATAAATAAAATATCTTCTT +GTTAAACAATTATTCCATGAAATATGAAGTTATAAAGATATAATCCAGAT +ACGAAACTTTGAGATTTTCGTCCGAAAAGTACGGTAGTGGGTCTCGACAC +GACAAATGTTAGTTGTGCGCCTTTGAAGATTACTATCATTTAATTTTTTG +ATTAAATCGTTTTATTCTAAATGCTTAGTATTTACACAAATATATTTCAA +TCGGAATTTTCAAAGTATTTTCCTAGAGGAAACTACTTTGAAAATTCAAA +AATTCAGCAACAACGAGAGTTTGTAATTACGGTAATCTTCAAAGGCGCAC +ACCTACTCGTATTTAACAAAAATTTGTCGTGTCAAGACCGGGTTCCGCTC +TTTTTTGCAGTAGTACAGAGAATATTCCGAAAGAAGTTTGAAATTAAATT +AATTTATTATACCGACAAATCATCAAATGTGATGTATTTCAATGAGTACA +ATTTTTCGAAAGAAAAACAGATTGAGGATAAAACTTGAGTGATGAGATAA +CCGTAATATGGAGAATTATATCAGTGTCAAGAAGGCACATTGTTCAGTTT +CATATTTACAGATGTTTGGGATTAAATGAAGATTCGGTATGCATCGACGA +TCAGAACAATGAACGAGTGAGTTGAAAGACCTGGAATGTTTAAAATTAAT +TTTTAAATTAAAATTCAATTTTCTGCCAACCTTTTGTTGAATGATATCAG +TTCTTGGAGAAGACCATCTTTCAATTGCTTCAAATTCACGACGTCTTTTC +ACCAGTTGTAAGAAGGCGGTCAACGACCCGTCAACGTTTGCGTAGTGTGT +GCGGCAGGAGAAGAGACTGCAAAACAGAGATATTTCATGAAGACTCAAAT +GAACAGCGGAAAATGTGCCTCGATAGCTCAGTTGGGAGAGCGTACGACTG +AAGATCGTAAGGTCACCAGTTCGATCCTGGTTCGGGGCAATTCTTTTTCG +ATTTTTTGAAACTTACCAAATTCCATCATACGAGAGTCCTTGGTGCAGAA +TTTCGTGAACCAAATCAAATTCTGCTTCTTGCCGGTTTAGAAGATCTGGT +GCCAAAATGATGTCGAATTTTTTTCCGCCAAGGAACTTCATTGCCTCTTC +AATTGTACCACATGAAACCTTGGTCTTGATCATTGGAATATTATTTCTCT +TAAGTGTTGGACGACAGTAAAGCTCCAAACTAGTCTTATCCATTGTGTGC +ATTGCAATTTCTTCTGCTCCATTTTCGAAAGCATATACTGATGGGAGTCC +AGTTACGAATCCGATTTCCAAAACAGATTTTCCATCGAAAAAGTCAGTTT +CCATCTCAACATTGACAATATTATCAATAGTGTGGCAAATTGTGTTGACA +CCTTCCCAGTCATCATGATGATCTGAAAATAAGACATTTTCAAAATTAAA +TCAAAAATTAAAAAAGCAATGCCCCGAACCAGGATCGAACTGGTGACCTT +ACGATCTTCAGTCGTACGCTCTCCCAACTGAGCTATCGAGGCATGCCGAA +ACTGACGTTTCTGATCAAACAGGAAAAATTTGAAATTGAACCGTACCTTG +GTGACTATTTTTTCTCATTATAGGCAATTCCTCTGCCACATGATTTTTCG +GGTTTGAATTTCTGGTTGCCACGTACTTGATGCAACGGCCGGAATCCAAA +CTGAGTTCACAGTTGACTGGAGCAACGGAAGCAATTGTTGGGATGGTAAG +GACCATTTTTGTGTGATGAAAGGATATGAGAAGAACGCTGAAACCGAGCC +GGAGCTATTATTTATTTTGAAAAAAATGACCCGAAACTAAAGAGACAGGT +GGTTCTTTGTAGTTTTAGATAGTAAATAAAGTAATGGGTAGGGGGCAAAA +ACAAAGAAGATAAGATACGCTTTAGGATGGAGTGAATCTTGATCGCATGA +CGTGAATGGCAGAAAGTTGAAATTTTGAAAAAAAAGGAAGGAAAAAGTTT +GTACCGAAAACTAGAGAATTTCCAAAGGGTTTGTTTGCATTCGACCATTC +AGTTGCTTTTAAAGATTGTCCCGCAAAAGAGAGAGGGGCGGAGTGAATGG +AAAAGTCACGAGGGCGGGTTGTGTGCAGAACAACGGCAGGAAAATCATAA +ACACGAACATAGAGAGAGCTATTGTACTTGGAAGAATCAAAGAACGAACT +GGAGTACTGGAATTGAGTGTGGAATTTGTATGTCTAGGTCAGGAAAGTGA +GATCTAAAGCTAAAATAAAAGTGAAAGTCCATCTGAGTTGCAGGAGAATA +ATTGAGATATTAGTCGAGCTTGTTCTTCAACTCTAAAAACTAATTGAATA +AGTTTAAAGTAAAATGGAGCTCTTGAGAAAAAGCGAAAATGCCATGCTGC +TTTAATACCACCCTAACCCCCTTCTATTGACTCCGCCTACTTTTTGTCTT +CGCTAGATGAAGTAGATGAAGACGCCGGAAGCAATTGGAGATGTTAGACG +TTTAGAGATGATAGAATCTATTCATCCAAGTGAGAGATCAAACAAGACAC +TGAAGAAGAAAAGAAGATTGGCGGTAAAAGGGATGATCTTCAAAGAACAG +AGAGCTCCGACACTTTTTGTTTTTCATCTCTTTTTTCACTGGTTGGTGGT +CAGCGAAGAGGAATAAGAAGAAGAATGATGGCAAGAAAAAGAGAACAGCT +GATCGATTCTTGTTCCTGAATAGATCAATGACGTACCATGAAAGAGTGTG +AGTAGGTGGATAGTTGATGAAATTGATGGCAGAGGGTAGATCATTGTAGA +AAAGAACGATCAACTTGAAGTAAACACAAAGTAATACGATAGAGATGATT +CCTTAGCAAAAATAGTACCGGTGTCTACTGAACAGATCACGATTAATCTT +TTTTATAGAGAACTTCTGTTGCTTAGAGACATTGTTTTTTAGAAGAAGTT +GAAATATCAATTGAGATTCTCGTCAACTTCCTGATGAACACAAATAACTG +GGCATTCACCAAAGGGCACCGCGATCAGTTCAGGGTTTCACCTAGACCGG +GTTATTCAAAACTTCGAAAAATTCGGATAACCTGTAATTTGTCGATTTTC +CGAACTGGCCGATAATCAACGTGCCGAACGATAATCACCGCTCATTCCTG +ATTGAATACACGAGAAGCTGCAAATTAAAATACCTAAATTTCGAGGTTTC +CAGAGTTTAGCCAACATCTAAGCTTAGACAAAAAAAAGCCTTTGCGAGTG +TTTCTGGAATGGATTATTTTCTCTCAGCTTCAGTAAAAGATCCTTGGTTA +CTATTTCTTGACCATCTCACGTTTAACTCCCAAAAATCATCGAACACAAC +TAATTTCTCAACAAACCTATTTACTTTATTTCGCCTTGCTTTGTTATTTA +TTTCACTAAAAAACGAACCTCAATGATTCACGTGTGATCGGTGACCAATG +AGGCTGGCTGGCACGTGTACCAGGCGAAAATTTTCAGTGAGAAACACAGA +GAACAGTGTATTCGGTAAATAAAATTAAACTTATTAAACTTTTTTTGCTG +CTGATTCACGTGAATAATACACCAATGTTTGAGGTGAGCAGACTGATGAT +ATCATTATTAATAGCAAGCTAATTACTAATACATTTTTAGACGTAATTCA +ATCTAACTATTTCTATAGGAATATTTTTTCAACTTGCGAAATAGTTCCAT +AATAATAAAACAACTAGAATTGGGCCATCAAGTAGTTACAAAAACTGTGA +CTGCCATAAAAGGATTCAGGTAGGTAGGTGGCGCCATGCGTTTCAATTGA +ATATTACTGAGTCAACTTCACATCTACAAAAATCACAGCATGATCATATC +TAGTAAAGAAATCATAATAACAGCTGAGAGTGTTCAGATTTGCATAAATG +ATGACATCAACATCTGATCTAGTCCAAACAATCACACTTTCATTTCCTTC +ATTCTATGAATAATCTTCTCTTTAAAAAGAACCTGTTCTTCAGGACCTCC +ATCTATATTGTCGTCTTCTCGTATGTTCCCAAATGTTATGATAATACTAT +CTTCTATGACATCTTTTGGTGTGGGTGCATAATGACGTGGCAACCGTACT +CAGTGTGCACCAAACTACGAAAACAACGAACAAACAAAAGTGAGTGTTAT +CTTATTATCGCATAACGAATTCTTGTCGAATGGCATGATATGGGGACCTG +GAAATTTAATTTGAAAATCTATCCATCAACTGAAAGAAAAAAATGAATCA +GAGGGATTTGCGAAATTTCAAAATCAAAGAATGACCACGTGTCTTATTCA +ATCAAATCAGTGGACGTTTCTCATAGAATGCTGCAGAGTAGTTACGAAGG +AAATGTCTCAGTTTCTACCGCCAGCCTATCACAGGGATGGGTCTCACCAC +AACGGGCTTCGCAAAAATGGGTCTTGCAAGGATGGGTCCCGCAACAATCG +GTTCCGTAACAATAAGTCTCGTAAAAATGGGTCTCACAGCGAAGAACCGC +TTAAATCGTGTAACACGAGGCCCTTGCTTCCACGCCAAAATGCGGAAAAG +AAGTACTCGGCATTCAGAGGCAACTTCCTGCCATTTTATGGCAACTTCCT +GCCAAACTTTTGCACATTCCTGCCAACATTTAGATGTTGCTGAGCGTTTT +TTCCCACTTGCTGCCACTACGTGGCGACTTCCTGCCAACAACTCGGCAAC +TTCCTGCCAGCAATTTGGCAACTTATTTTAGTGGCACTAAATAAGTTGCC +AAAGTGTGGCAGGAAGTTGTCTCTGAATGCCGAGTTGGCAGGCACGTAGG +CATTTAAGGGGGAAGGTGCCTGTCTGCCCTAGAAGACTTACTAAATAACT +ATGAAAAGCATTAACATTTAGCGGCCACGCATAACACTATAAACCATTTC +AGTTTGCTTACAGAGAATAATCAAGAAAAGCGGAACGTTATCTTGAAACT +ATTGAACTATCTCCATGTGATGCGTTTTCTAGCACTTGACTTAATTTATC +ATTCCGTTTCTGTCGCTTGTGATAAGAAAATTCTCAAATTTAATCGCCAA +TTTAATTTTTTCGCAGATGGTAATCTGTTTGGACTTTCGTTCCAGTTCGG +TTCACGTGAAACTCATGTTCAAGTTTTTATTTTTTAATTTTCAGATGAGA +GCTACGTAATCATATTTTTGTTTTGTTTGCATTGGTATGGTTCTTGATCT +ACTCGTATGGGGGTAAAAACGTGTGTCTAGTTGACATCAGTCGAGTGATA +AGATGGAGATGAATAGTTTGGGAAGAAGATGGGGATCGGAAAAAAGCTCA +AGAAATGGGTTTCAACTTTTTGAAGTTTTAATTTGGCAGCGTGAACATTT +GCAAGGTATATGAAACTTCAAGAAGGAGAAGTGTAGGAAACAATGTGGAG +ACGCTGTTATAAATTACGTTTTGAAAAAGTCTCCGGTTTTCCGTCAAGCA +ACTTAGATATACATATGTATACTGCTTCACTTTGAAAAATCAATATGTCG +CAAAATAAAATAAAAATAAGCCGCGAAGAAAAATAAAAATAAGCCGCGTA +GGCCCGTAACCGTCTTTCTACCTTACTTATCGGAGTACGCCTAACTCGTG +CCTACGTGCCTACTGCCAAAACAAATTTGGCACGAGACAGGCCTTTTCAC +ATGTAGTATTTTCGTAACATGGATGCGTTATCCGATATATGGAAACTTTA +GAGCTTTTATTCATTTAAAGTCAGTTTCTAATTCTGATCAGTATCGCCTT +GTGCTGAACAACGAATTGTTAATTACATAACCACATCTCGTACTTCTGAT +CTATTGATACTTCTTTTTTGTTTTGTTTCCATTTTTAAAAAGTTGCTCGT +GTTTTTTGTACTTTTCCGTTTTCATAGTCTTTACCATCAGTTGTCATGGA +AATTGTTTTCAAAGTTTGTGGATTTCAACCCGGAAGTCTATCAATTGTTT +GCCACTCTGTTTTGTCAATGTATTTCTAGGTGCACCTTGCAGACATGTAC +TCTGCGACCGAACCTGTCACCTGACAAAGTACCGTAACTTATTGTTTTCA +CAAACACTTAAAGATGCCTGACCTGGAACGACGATCACTATCATCAAACG +GTAATGATCGAAATTAGTTGAAAACATGATTTGACCGCAGATCGAACTGA +TGGCGCCAGGGAAACACCGATAGAAATGTTTTTCAAAAGTCACATGTGCT +AAAAATAACTTTTATTTGGTCTTTCTTTTTCCTTTTGTCACATCAACCAT +TACATGGGCGGTCAATATCTGTCTTCTTATCTATTCAGTCTCACGTTAAA +TACACGTGTCAATTTATGTAAAAGTCGGGCGGTCCTGACTTTTGATGATC +ATGACATGTTTTCGATCAGTCGGAAACTGGGGGGAAAAAGAGAAAGACAG +AGACGTGTTTACCACTATGCAGCTGCAATTCGGTTTTCTCTCGAACATTC +CATGACGAGAAGAGACGCAGAACATTCAATCAAGAGGGGTCTTTGAAGAT +CATCGTGTGCTTCTTTTATGTGTGTTTGTGTACGTGTTTGTGTGTTACTA +AAAGTTGACTGAGGGACGGGAGACAGAGAGAGCTCAATGAGCAATGACGG +ATGAGGGATTGGTTCTTAAAGTTGGTTATTTGTGGAACTTCAAGGTAGTT +GTGATGTCTAAAGATTAAGGTTAAGAAACCAAATATTTTAAGTTTAAGAT +AAATAAATAAAAACTTTTCAGGCATCGTCCGTAATTATCGGTAATTCGGA +ATGATTTTAGATCTCAAGACCAGAAAAATTTTTGCGAAATATTCAAAATT +TAATCGATAAAAATTTTAGGTGGAAAATTCAATTTCAATTTTTAAAAATT +TTGGACCGGAAATTGAACTGCAAACTCTACGAAATGGCCGATTGCACCAT +GTTGTTCGGACATTTTTAATTAAAAATTAGTATCAAAATTTTTTTCAAAT +GAATTTAAAAAAATTACCTGATTTATTTTAAAATCCCATTAGTCTCAGCT +AGCACGTTTTAAAAAGTACTCAGAACTGTTCTGAAAATTGTAAATTACTG +GTAAACAAAATTCGACAATTCCATATTCTACTAGTGGCAACTTTCAAATC +ATTTTCGAGCATTTTCTTAAATATTTCAGCTAAACAATGGGATTTTCGGA +TTAATCAAATAATTTTCTATATTTTGACATTAATTTTCGATGAAAAATAT +AGGAACTGCATGGTGCAGTGTGTCATTTTGCAGAATCGCCGAAAATAACT +GTCGCACATTCTTGATTGAAACTTCATTTTTTTAATTGGCGCGAAATTCA +AATTTTAATTTTTAAATATTTTAGGCGGAAACCTCAGACTTCAACTTTCG +AACCATTTTGACTGAAAAATCATACTTTAATTTTCAGAAATTAATGAAAT +CAGCATTATTGTAGATGTTTCGGCGCACACCGTTGGCCTGTGGGTTGGCC +GGTAGGCATATAAGTGCCTATATGAGATGTTGATCTAGAAATTAAAAATA +GACAAATTTCTAATCTAGGATATTTAACTGTTCAGTTTTGTTTGTCATAT +TTTCTGCTTTCAAATGTTTTCAATTTCTTCTCCTCTTTTCATATTTTTCA +TTTGCACTTCACATCAAAGTTTTATTGTCAGAGGACCTAGTAAATAAATA +CATTTTTCATTCTCTCACATGCTGGTGTTTGATGTTCAGTTAAAAATGTT +TTTGATTCTGCAGGAAAAAAAAGAAGAAAAAGTCAACATACAGTACCGGT +TCATTTGATTTCATCGTTTTGGAATTACAGAAAAATACTGGAACTAAATA +AATAAATTTAAAAACAAAATAGAAAAAAATCAGATTTTCAAAACAATATA +ATTCCTATTAGCATTATAAACTATTTTAAATGGAGCTATTCGTGAATTGT +CTTTTGAATTTGCAGCAAGTATCCAAAGATCACAAAGAGCTGTTCGACCA +CTATTCTTTGGATCTTCTTCATCCCACATTCCACGATAAACACATTCTGG +CCTGAAAAATAGGTTTTTTTTAAATGAACATTATTGTTCCGATTTTTATA +CCTCCTCTCATCATTTGAACAATTCATAAGTTGGAAAACAAAATAATTGA +CACCCATTTCTTTCATTATTTTGTGAACTTCCGCGATTGGTTTTTTACTA +AACATAGAATACACTTTCAGAGTTCGTTCACTGAAACAAACAAGGTTTAA +TACTTTCTAGAGAAAATTAACAAATTACCGTATTCCAACATGTTCATAAT +GAGGGTGATTGACAATTGGCCTAAGAGTTGTGAGCTTCACATTTGCCATT +ACTGGCATTGTTCCAGCGAACACAGCGTCTGAAAAAAGTTATTTTGAAGC +TAATATGGAAGGCAGGCATGTCTTCAGATGTACGTGCCTGCCTACCACTT +CGGTGGTAAGTTTGTATATTACCTTGTTTGGTGTTATGTTGAATCCAGTC +AAACAACATTTCTTGATCAGGATTACTGTATTCTCCCTTGACATTCAACT +GTTGACGGATATTTGGAATTCCTGAAGGATGAATAAATATTTAGTAATGC +AGTGAAAGAATTCAAACCTCTATAAAATAATATTGCGATCACTCCAACAA +GAGCAGAAACTCGTATAGTTTTTGATATTCGATCTCCCCCCAGCAGCTTG +GAGTTTGCAAAAAGTGCAGCAACAATACATAAATGAGGTGTCATGAATAG +TTTGAGACGCATTATCAGGAATGCCATGACAGTTGAACAGCAGAGTTGGA +CCACATTATATAGAATCTGAAATGCCAAATTGTAACTTTTTAGTTTCAAA +AGTTATAGTATTGAAGCTGAAAACAATAATCTTAAAATTTTAAAATTTTA +AACAAAGCTGTATTTTATTTTTCTATTTTTTTAATAATATTTTTTAAATA +ACCACCGATCTGCCAGATTCAAAATAGTTGTTTTAAAATTTGGAAATTAC +CTCTCCATTTTCTCCAATTTCTTCACTATTCCTCCATAATAAATTCGTAT +TTTTCACAAAGTTAAAAACGAAAGTGACAAGAGAAATGAGGGCAAGTGGA +ATAAGAAGTGTTCCACACAATTTCTCAATTGTTGAATATTGAATAAAATC +GAATTCAGCTGAACAAGTATACAGTCTCGTGTGAAAGTTGGCAAAACTTG +TGAATTTTGAACGAAGAATATCAAAGATATGAGCCTGAAATCATTAATTG +AAACACAATACAATAAAAACGATAACCAACATCATCTTCAATTCCAAGTC +CTTTTGACAATCCAATTTTAAGCCCAAGAGTAATTGATGCAAAGATGATA +GCCAGGAAAAGAACATAAGCTGGACGGAATTTAAGATTTGAGAGTAGAGG +AGAGATGTATATGATCATCTGGAAGCTTTTTGAAATTTATTTTTATCTAG +TTTTTTTTTTTTTGGATTTGAGTTTTTACCCGAAAATTGTATAAGAGACA +TAAACTATTTTTAGTTTCTCTATTCAATAGAAAACTTACCCCCAATGCCA +AAATACTTGGAAAATACAATGCAGTAATCATCATCTCGTTACCAAAAAGC +AGCAAAAATCCAATGAGAAATGAAATGATATGAGAATGAATTACTGTTTT +AGCAGTAGAGAATGGAATCAGATCCAGTGAAAACGCTAAGAATATTGAAC +AAATTTGTGTGAAAAATGCAAATTGGGTGAATTGCCAGAATAAAAGAGCT +GGAACTGCCATGGAAGTTAAAAGAAGAATCATTGAGTGACCAGACTTTTT +ATATCTGGAAAAATGAGTTTTGAGAACATCATATGCCAAAGAAAATTAGC +ATTGCTACGTCTCGCATTTTCGTCATTTTTTTTGTCATAACTTACTTGAT +AACAAATGTCAAAATCGCAATATGCCCGATAATAAACGGAAAAGCAAAGC +TCTCCCGAAGAGGAGGTGTCCATTGAACTCGAGTTGCTTCTCCGTGATTA +AATGCAAAACATAATACTGACAAGAATCCTCCGAATATTGAATCACTGTA +ATATATATATTTAGTTAAAAGTAATAATTTTTGTGTCGCTCACCTAACAA +GAACACCCAAATAAAAAATGGATGAAGCTACAGTTCCAGCCACAATGAAA +ACTCCGGTTATATAGAAATAATGAGGATTTCCAATTCCTTCACAACTTTC +AACAGGCCGAAGTTCTCCTCGATTAACTTGCCAGCATAGTTCGATTTGCC +AATTCGCTGATTTGGCAAATGCTCGGAATGGGCGGTACAGAAATGCAAGG +ATTACCTAAAATAAGCTTGGAAGGATTTTAGGTATGGAGGAAATTGAGAG +AAAATGTAATACAGGTGAATGTATTCTTGTGGTAATTAGAAGTTTGCCAA +AGGTTTTTCCGACGGAAAACATTACAATGTGAAAACTTTGCGACTCACCT +CTGGATATAAATTAAATCGATTTAGTGTGTTAATCTCGTGTCCATGTTCT +GTCACAGTATCATGTGTTATCTCCTGAACACCCTCAAGAAATGATGGTGC +ATTGATTATTGTTTTATAGTATGAATAGTAGAGACCCTGAAAAAACAAGA +AATTTTGAATTTGTATTTTTAAAGAAAACTGTACCATTTCCGTTCGATAT +GCCATCTCGCGTTCAAAGTCCGCCAAATGGGAGAAATGTTTGTCGTTTTC +GAATAATGTGTAGACATGTTGGTAGTTGATGTATCCAACGAGGAGTCCTG +AAAATAAAAAAAACTGTATTGAAATTCGTTTGAGCGTAAAATAATTTGTC +GCAAGTTTGATAAAGATCCGAGAAGACTATTTCTGTAACTTGCATGATTG +TCTGCCAACTAATTATTTTCTGAATTTTTCTGTCAGGTCACATGCCATAT +CCGGGTGGAACAGGACCACTTTAGAGATTGATACGGCTCTAGTTTAATGA +AGAGCAAAAAATCTGCAGGTAGGTCAGTAGGTAGGTGTTGTAAGCAGGCA +GACATTTTGAACCCTACATGGACACCCCATTTCAAATAACTACTATTAAA +ACATAGTTTTCTTAAAAAAGTATTTAATTTGTTTCCCCACCCAATTGCTC +TTAATAATCCTCATTGGTTCTCCTAGTTTTACTCTTTTGTTCCCATTTCC +CTTTTTTTGCAATTCAAAACACATCCAGTTTTGAGAGAGAGGTCTCTCTC +TCTGTCCCTCTCTACCTCCCCCTCTCTCTGTAATTATCGACTTTGGGGAC +GAAATGTCAGTTCATTTGTGGAAATAGTTTATGAATCGGAGAGACTTTAG +ACACTTTGAACAAACCGTGTCGTCGTCAGTGACGAGACACAAACACAGGA +AGTTTCGTTGAATATGTTGATGTTTCTGTTGACGTTTCGTTCCCTAGTTT +TTAGAGATTGAGAGCATCTAAGGATTAAGGTTCAATGTTTCAAGATTTAA +AGTTTTGAAACATAAGTAACAGAGTAAATGATATGATTTAGATAATTTTC +TTATTTTTTAATCTGGCAAGCACGCTCAACTAACAAAACACGAATCCGAC +AATCAGTCAAACATCTTAAACTTTTTAAAAAATTGTTCATTCTTTATAAG +AGCGAATTTCAAATTTAAAAAAAACTTTAATTAAGCTTCAGGTCAAGCAA +TTAGGCGTTATTATTAATTCTGGCAAGTTTCCGTTTTTCAGATATAATCA +TTTCAATTTCGATTCTTTCTTCAAAGTGTCTGGAAAAAATGCTCTTTTTT +AATAATTTCGCCGAATCTAATAGTTCTAAAATTTTATGTTGAAACGATCA +ATTCTATAACAGTATATTCAAAAATAACCTCACTAAACTTGAATTTTTTC +CAAAAAAAAGCATTCAAAGTGAGCAAATAGTTTTGGTAATACAGGTGGCT +ACAAATTTTCTGTCAAAATGTTCAATACACAAAGTGTGAGCAAGAGCAGA +ACCAGTTTTCAAACATATTGCTCTCAGTTCTCACTTTCATTTTTGTTTGT +ATTAGAGGCTCATTGAGCAATAGCAACTTGAACTTTACTTACTGTTTGTA +ATAGCTTAACTGTTCACATTTTTACTAAAACTTTGCAACCTATAGGTATA +CCTAATAATTGGGTTTTTCAATTTTGTATGAGAAATCACATCCCGAAACT +GAATGGAAACTTTCTGATTAAAATGGATATCACTTCAAACATAGTGTCTT +TTAATATTGTCTAAGCCTGTATGACAGTGATAATTTCTAATAAAGAGGCG +CAGAGAAATTAGATAATGACCGATAATGATGAGATGAATGAATGTAATGA +TGTGGGAAAATAGATAATAGAATAAGAAGTGTGGGCATAATGATTTAGAT +GATAGAAGCGTTCAGTCTAGGAATCATTTCAAGTCAATATCACTAATTAT +CTCATTTATTTTTGTTCCTTTGATTTATTCGTTTTGGTAGGGCCGTTTCA +TCTTAAAGCGTATATTCAAAACAATTAAAAAATCGTTTTTGAAGTCTTCC +AAGTAAAAATAAATATCTGCTTTGTGCCTATATTGCGCACCTATCATTTA +ATTTCTTAAATGGGGCGTAGCAATTTTGGACTTCTGCTTCAATATCTTCA +AAATGATCCCAATAGACCGAATTTCATAATGTGACTCCTCGAAAATTTCT +TATGAAGATACAACATTTTAACACTGTTTTCTTTCATAGTGTCCAACGCC +TGCCTTTTTCCTACACTAGTTTTTTTTTCAATATCCTTGCAAGTGCACCT +GCAAGAAAAGTTTAACAAAAGTTCTTAAACTTAAAAAAGTAGGTGGTAGG +CAGGCACGTAGGTGCGCAGGTATGTAGCCAATAGATTCTCAAAATATAAT +TATAAAAACTCTGCATTTCTTTTTATTCATCCTAAAAGCGCATTCTACTC +AAAACCCAGTCACATGCTCTATTCACAAAAGTCAATTTTTTTTCATCTGC +TCTCCGCATAAACTCTTGCTTCCAATTTCCAGATGCATTTAATGTCACGT +GCGTTCACTTTTTTCCCTTCAAGCTTCCAGAACAAAAAAGTCTTGTGATG +TTTCGCTCAGAAATTTGTTTGAATAGATGATAATTGGATTTCGGTTGAAT +TATTTTTGCCATCGCATGTCCTTTCTTCAATTCTCAATTTCCAGTTTTCG +TGTGATTCTCCAAAAGTTTCAAACATCATCCGAATTTTCTTGTTTTTCTA +TTATGTCACTGATTTCCTATTTATTTTTCATATTTTTAAAGTTTTTTTTA +AGGGAAAATGAAAACGCCTGCAATTTTATTACTCCTATTCTTATTTATGA +TGAGTTTGGCACAAAGTGATGAGGAACAAAGTGGAAAGGAACCTCCCGAA +AAAGATGATGTTCTAGTTATTCTGGTTGGTTTTTGCCAATGTTACCATGT +CAGAAATGCATGCATGCCTACTTTCGGCACATCTTCGTTTTCTTTTCCCG +TATTCTATGTTTTAATTTTCAGGCAGAAGACAATTCGACACTATTAAACA +ATAACCAAACAATGTATGACCCTTCTTCCGAAGAAAAAGAAAGTGACAAA +CCAAAATTCAATTTATTGGAGACATATTTACCACTTTTCATATTTGTACT +CGATCTACCAACAGAGGATAGAGAAACTTTGAAAGCCTATGTAAAAGATA +AAACTTATGAAAAACTGAAATATGTAATCGATTCAAAAGTTAAAATGAAT +CGAAATGAGAAAAATGTCTTAGGAAGAATCAATGAATCCTTGGTACAATC +AATCGATAATTTGGAACATTTTGATGAGATTACTGTAGATGTGGTGAATG +GTTATCACGTAAGTTTGACATTTCAATGGAAAATCATCTCTCTTTAATCT +TTAAAATGTTTTATTTTCAAAAAAAACCTTAATATGACAATACCAAATTT +TCCTTCTATAACTAAATTTCTAGAATGCCTCCGTATACAACGCAATCCGT +TCAGTTTGGTTGAAACAATTACGTCCATTTCTGCCGCTAATTGATCAAAT +GGCCATTCAGCAGTATTTTGCACAAAGGTTTCAAACATCAAGCGTCTATC +AAAAACTATGTTTATTTTCCAGAAAATCTTCTGAAACTGAGAAAATGTCA +ATTTGGGCAAGATTATGGGAAAATCTGAGTGTTTGGATGTATGGAAAGAA +AAAATTAAACGAATGTTATGCAATGGAGCCAAAATGTGTTCGACACGCAT +TGGAATTGCTTAATTTTGAACAAAGGATTGATTTGGATTTAGCTGCTTAT +GAGAACAAGTTTGACAGTGTTGATGGAATTATTAGGGAAAGAGTGAGTCA +ATTTAGATATTTAAAAAATGTCAAAAGCCCCAAAACGAATTAATTTTCAG +CTTGGCGAAAATCGGAAAAGCAGTGAACTCGACGAATGGTTGCACAGAAA +TCGCCCACCAAAAGCGCTACAAGCTATTTTAGATGAGCGAAGTGACATTG +AGGAGGAGTAAGATATTTATATTGAATAGATAAACGTTAGATTTTTAGAG +CTCTTCAACGCCTTCGAGATAACGGAGTTCTGAGCTCACTCAAACATTAT +TACAAAGCAGTTATAGAGTCAAGAAGTCATGAAGAGCAAGAGGTAACCAA +TGTTTTAGATAGTTCTTGTCAATTTTCTTACGCATTTTTCGATATTCTGC +TAATGGGTGCAAAGTTCTAAATTGTTTTTAATTGTAGCTTTCTATATTAA +ACTTTCAGGACATTCGTCATTTCTTCGACATAATGAATGACACATTTGCT +CGTTGCTTTGACCCACTACGAGGCCAGTATCATGATTTCTAGAAAAACCC +TCTTTTTGACTTCTTCCTCCAT diff -r 000000000000 -r cd00b4fe6552 test-data/r17.bob --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/r17.bob Mon Dec 22 09:08:31 2014 -0500 @@ -0,0 +1,1 @@ + 22801 22817 F22B7 921215/f22b7.seq diff -r 000000000000 -r cd00b4fe6552 test-data/r17.des --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/r17.des Mon Dec 22 09:08:31 2014 -0500 @@ -0,0 +1,13 @@ +# r17.des +# +# descriptor of the consensus binding site for the phage R17 coat +# protein: a stem with a single-base A bulge and a 4-base loop +# + + +h1 s1 h2 s2 h2' h1' + +h1 0:0 NNNN:NNNN +h2 0:0 NN:NN +s1 0 A +s2 0 AUYA diff -r 000000000000 -r cd00b4fe6552 test-data/trna.bob --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/trna.bob Mon Dec 22 09:08:31 2014 -0500 @@ -0,0 +1,5 @@ + 11092 11167 F22B7 921215/f22b7.seq + 28644 28556 F22B7 921215/f22b7.seq + 28643 28556 F22B7 921215/f22b7.seq + 28635 28556 F22B7 921215/f22b7.seq + 28630 28556 F22B7 921215/f22b7.seq diff -r 000000000000 -r cd00b4fe6552 test-data/trna.des --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/trna.des Mon Dec 22 09:08:31 2014 -0500 @@ -0,0 +1,19 @@ +# trna.des +# +# Generalized descriptor of a tRNA cloverleaf. Doesn't +# find them all though. +# + +h1 s1 h2 s2 h2' s3 h3 s4 h3' s5 h4 s6 h4' h1' s8 + +h1 0:2 NNNNNNN:NNNNNNN +h2 0:1 *NNN:NNN* +h3 0:1 NNNNN:NNNNN +h4 0:1 NNNNN:NNNNN +s1 0 TN +s2 0 NNNN********** +s3 0 N +s4 0 NNNNNN* +s5 0 NN******************** +s6 0 TTC**** +s8 0 NCCA diff -r 000000000000 -r cd00b4fe6552 tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Mon Dec 22 09:08:31 2014 -0500 @@ -0,0 +1,21 @@ + + + + + + http://selab.janelia.org/software/rnabob/rnabob.tar.gz + make + + rnabob + $INSTALL_DIR/bin + + + $INSTALL_DIR/bin + + + + +Compiling rnabob requires a C compiler (typically gcc) + + +