Repository 'get_orfs_or_cdss'
hg clone https://toolshed.g2.bx.psu.edu/repos/peterjc/get_orfs_or_cdss

Changeset 3:6a14074bc810 (2013-07-29)
Previous changeset 2:324775a016ce (2013-04-23) Next changeset 4:d51819d2d7e2 (2013-07-29)
Commit message:
Uploaded v0.0.8, automated Biopython dependency handling via ToolShed; MIT license; reST markup for README file.
added:
test-data/sanger-pairs-forward.fastq
test-data/sanger-pairs-interleaved.fastq
test-data/sanger-pairs-mixed.fastq
test-data/sanger-pairs-reverse.fastq
test-data/sanger-pairs-singles.fastq
tools/fastq/fastq_paired_unpaired.py
tools/fastq/fastq_paired_unpaired.rst
tools/fastq/fastq_paired_unpaired.xml
removed:
test-data/Ssuis.fasta
test-data/get_orf_input.Suis_ORF.nuc.fasta
test-data/get_orf_input.Suis_ORF.prot.fasta
test-data/get_orf_input.fasta
test-data/get_orf_input.t11_nuc_out.fasta
test-data/get_orf_input.t11_open_nuc_out.fasta
test-data/get_orf_input.t11_open_prot_out.fasta
test-data/get_orf_input.t11_prot_out.fasta
test-data/get_orf_input.t1_nuc_out.fasta
test-data/get_orf_input.t1_prot_out.fasta
tools/filters/get_orfs_or_cdss.py
tools/filters/get_orfs_or_cdss.txt
tools/filters/get_orfs_or_cdss.xml
b
diff -r 324775a016ce -r 6a14074bc810 test-data/Ssuis.fasta
--- a/test-data/Ssuis.fasta Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
b'@@ -1,33460 +0,0 @@\n->Streptococcus_suis\n-ATGAACCAAGAACAACTTTTTTGGCAACGATTTATTGAATTGGCAAAGGTAAATTTTAAG\n-CCATCTATTTATGATTTTTATGTCGCTGATGCAAAATTACTCGGAATCAACCAGCAAGTT\n-GCCAATATTTTCTTAAATCGTCCATTTAAAAAAGATTTCTGGGAAAAAAACTTCGAAGAG\n-TTAATGATTGCCGCTAGTTTTGAAAGCTACGGAGAGCCTCTTACCATCCAATATCAATTT\n-ACAGAGGATGAACAGGAGATTAGGAATACTACAAACACAAGAAGTTCAATAGTTCACCAG\n-GTACAGACACTTGAGCCGGCTACTCCTCAAGAAACTTTTAAACCGGTTCATTCTGATATA\n-AAATCCCAGTACACCTTTGCTAATTTTGTACAAGGAGACAATAATCACTGGGCAAAGGCT\n-GCAGCTTTAGCTGTATCTGATAACCTAGGTGAGCTCTACAATCCATTATTCATTTTTGGT\n-GGTCCTGGTCTTGGAAAAACTCATATTTTAAATGCGATTGGAAATAAGGTTCTAGCCGAT\n-AATCCCCAGGCAAGGATAAAATATGTCTCATCGGAAACATTCATCAATGAATTTTTAGAA\n-CACCTCCGTCTCAATGATATGGAAAGTTTCAAAAAAACCTATCGCAATCTGGACTTACTT\n-CTAATTGATGACATTCAGTCTCTCCGTAATAAAGCAACAACACAGGAAGAATTTTTCCAT\n-ACTTTTAATGCGCTTCATGAAAAAAATAAGCAGATTGTACTCACAAGCGACCGTAATCCC\n-GATCACTTAGACAATTTGGAAGAAAGACTAGTAACACGTTTCAAATGGGGGTTAACCAGT\n-GAAATCACTCCACCTGATTTTGAAACACGTATCGCAATTTTACGTAACAAGTGCGAGAAC\n-CTGCCTTACAACTTTACAAATGAGACGCTATCCTATCTAGCTGGGCAATTTGATTCGAAC\n-GTACGTGACCTTGAAGGTGCCTTAAAAGATATCCATTTGATAGCCACTATGCGTCAACTG\n-TCTGAGATAAGTGTCGAGGTTGCTGCTGAGGCTATTCGATCAAGAAAACAAACAAATCCA\n-CAAAACATGGTTATTCCTATTGAGAAAATCCAAACCGAAGTGGGAAATTTCTACGGTGTC\n-AGCTTGAAAGAATTAAAAGGTTCTAAGCGTGTTCAACATATCGTTCACGCGCGACAAGTT\n-GCTATGTTTTTAGCACGTGAAATGACAGACAATTCCCTTCCAAAAATTGGGAAAGAATTT\n-GGTAATCGAGACCATACAACCGTTATGCATGCATACAATAAAATAAAAACTCTCCTCTTG\n-GATGATGAGAATTTAGAAATAGAGATTACCAGTATAAAAAATAAACTTCGTTAACCTGTG\n-TATAACTTTTTTAAAAAACTCTGTTTTTTCCACAAGTTGTGAACAAGTTAATTTCCGCAG\n-TTTTATTGGTCTTTCATCACTTTTCCACAGAATACACAGAGACTACTATTACTATTAACC\n-TTATAGATAATAAATAAAGGAGAATCCATGATTCAATTTTCTATTAATAAAAATATATTT\n-CTACAAGCACTTAGTATTACTAAACGGGCAATCAGTACAAAAAATGCTATTCCAATTCTT\n-TCAACAGTAAAAATTACAGTAACTAGTGAAGGAATCACTTTAACTGGTTCAAATGGACAA\n-ATCTCGATAGAACATTTTATTTCTATTCAAGATGAAAATGCAGGGCTTTTGATCAGTTCT\n-CCAGGTTCCATTCTCTTAGAAGCTGGTTTCTTTATTAATGTCGTATCCAGTATGCCGGAT\n-TTGGTCCTTGACTTCAATGAAATTGAACAAAAGCAAATCGTTTTGACAAGTGGTAAGTCT\n-GAAATCACATTAAAGGGAAAAGAAGCAGAACAGTATCCTCGTTTACAGGAAGTTCCAACT\n-TCAAAACCATTGGTGTTAGAAACCAAAGTATTAAAACAAACAATTAATGAAACAGCATTT\n-GCAGCTTCTACACAAGAAAGTCGTCCTATTCTTACGGGTGTTCATTTTGTTTTAACAGAA\n-AATAAAAATCTAAAAACTGTTGCAACAGATTCACACCGTATGAGCCAACGGAAATTGGTC\n-CTTGATACCTCTGGTGATGATTTTAATGTTGTCATTCCAAGTCGTTCTCTCCGTGAATTT\n-ACTGCAGTTTTTACAGATGATATTGAAACAGTAGAAGTCTTCTTTTCAAATAATCAAATC\n-CTTTTTAGAAGCGAGCATATTAGCTTCTATACACGCTTATTAGAAGGTACCTACCCTGAT\n-ACCGACCGCTTAATTCCAACTGAGTTTAAAACAACTGCAATTTTTGATACTGCAAATCTT\n-CGTCACTCGATGGAGCGTGCTCGTCTTCTTTCAAATGCAACCCAAAATGGTACAGTAAAA\n-CTAGAAATTGCTAATAATGTTGTATCGGCTCATGTAAATTCTCCAGAAGTTGGACGTGTG\n-AATGAGGAATTAGATACTGTAGAAGTATCAGGTGAAGATTTAGTAATCAGCTTTAACCCA\n-ACTTACTTGATAGAAGCATTGAAAGCCACAACTAGTGAACAAGTGAAAATTAGCTTTATC\n-TCTTCTGTCCGTCCATTTACATTGATTCCAAATAATGAAGGGGAAGATTTTATTCAATTG\n-GTTACACCAGTTCGTACCAACTAAATAATATTAAGAACGGCTAAACTAGCCGTTTTTATG\n-TTATACTAAAAAATAGCACCTAGCTTATTTTTATATATTTAGTGATGGGGAATAAATGAC\n-GTTATATATATTAGCTAATCCTAATGCTGGTAGCCATACTGCTGAACATATCATATTCAA\n-AATAAAAGAAAGTTATCCACAGCTTGCAGTTAACATTTTTATGACAGTTGGTCCTGAGGA\n-TGAAAAAAGTCAAATAGAGGCTATTTTAAAGGAGTTTGTCAGTAGTGAAGATCAATTAAT\n-GATTTTAGGCGGAGACGGCACACTATCTAAAGCTTTGCGTTTTTGGCCAGCTAGTCTACC\n-GTTTGCTTATTATCCAACAGGATCTGGAAATGATTTTGCTAAGGCAATGAATATAACATC\n-GCTATATAGAAGTGTAGATGCCATTTTAGAGAGAAAAACAAGTCGGATATATGTTTTAAA\n-CAGTTCATACGGAACGGTTGTAAACAGTATGGATTTTGGCTTTGCAGCTCAAGTTATCAA\n-TGGTTCAACGAATTCAATTTTGAAAAAAATTCTGAACAAGGTAAAACTTGGGAAGTTAAC\n-TTATCTATTCTTTGGTATTAAAACATTATTTTCAAAACAAGCTATAAACTTAGAATTAAC\n-TCTTGATGAAAAATCTTATCAGTTAGATAATCTCTTTTTTATTTCTGTAGCAAATAGTCT\n-TTATTTTGGTGGAGGAATCATGATATGGCCAACAGCAAGTGCTAAAAAGAAGGAAGTAGA\n-TATTGTTTACTTCAAAAATGGAAATTTCTACCAACGTCTACAATCATTGTTAGCCTTATT\n-AACGAAGAGGCATGAATCTTCTCATACGATTCAGCATTTAACAGGGGTAGATGTAGTTTT\n-AAAATCAAAAGAAAAATTATTATTGCAAATAGATGGAGAGACATGCACTGCAAATGAGGT\n-AACGTTAACCTATCAGGAAAGAAGTATGTATCTTTAAGGAGGAAGTATGTACCAATTAGG\n-AACCTTTGTCGAAATGAAAAAGCCCCATGCCTGTGTCATCAAATCGACCGGAAAGAAGGC\n-TAATAAATGGGAGGTTATCCGTCTAGGAGCGGATATTAAAATCCGCTGTACCAACTGTGA\n-CCATGTCGTTATGATGAGCCGGCATGATTTTGAACGAAAAATGAAACAAGT'..b'CTCTACCAACTGAGCTA\n-TGGCGGAAGAAATAGTCCGTACGGGATTCGAACCCGTGTTACCGCCGTGAAAAGGCGGTG\n-TCTTAACCCCTTGACCAACGGACCATTTTTAGAACAATAACTAGTATAATACATGTGACT\n-TTGTTTGTCAATACATTTTTTGATTTTTTATTGTATTGACAGAGTGCTTTGTTTAATGTA\n-AAATAAAATGGTTAAGGTTCCATAGCTCAGCTGGATAGAGCATTCGCCTTCTAAGCGAAC\n-GGTCGCAGGTTCGAATCCTGCTGGAATCATTTAGACCTACCTCGAGTAGGTCTTTTTTCT\n-TGCCATAATTCATAATTAATATATAACACTGGCAAAATCAGACCAATAAGGGCATATTCT\n-TCAAATTGGAAGGATAGGTGAGTAGATATGATGACACCTAGCATAAACCCTATAATGGTC\n-AATAAGATGTTTCTACCTGTTTTTCTAAGTTCTGAATCTTTTTCAATAACTCCTTTAAAC\n-CAGAGATAAGCAGCATTTTTGACATTCCCTGTCATCATCACATTGGCATACGGAGCACCT\n-CGTAACCTTCTAAATGTTTCTACTTGAATAGAGGCTACGAAGGCTAGACTAGCAATTGTA\n-AAAGACGCAGGCATTATAGGTGAGAGAATGATAGTTAGTAAAATAAGAACTAACATCATT\n-ACACTACTACCAAAGTGCCAAGACCATGTTTGTTTTTCAAAATACCTTCTTGCTAAGTAG\n-GTAAAAAATTGTCCGAATACAAAAAATAAAATGGGAATGGAAAAATTAACTACCTGCGCA\n-AAATCACCTTTAGCTAAAAAATAAGCTAGGGAAATAACATTTCCAGATTGTACGCCAGCA\n-AAGCGACCACCCTGAGTCACAAAAGTAAAGGCATTTAAATAACCACTGATAAACGTTAAT\n-GAACAAGCAATTCTCAATCCCTCAAAAACACGATACTCTTTTTGATTCATTTTCACTCCT\n-TGTTTCACGTGAAACTACTTATGATATGGGCTTCCCTGCTGAATCATAAATGCACGATAA\n-ATCTGCTCGATGAGAACTAATTTCATTAGTTGATGGGGAAGTGTCAACTGTCCAAAACTC\n-ATCAACAAATTAGCTCTTTTTTTAATACAAGAATCGAGACCCAAACTACCACCGATGATA\n-AAAGTTATATCTGAATACCCATTTACTGCAATGTCAGATATCCTTTGACTAAATTCTTCC\n-GATGGAAATTGTTTCCCTTCTATCGCTAAGGCAATGACAAAATCTCGCTCTCCAATTTTA\n-GACATAATTCTATCGGCTTCTTTTTTTAATATTTGTTCATTCTCTGCCTGACTGGCTTTA\n-TCTGGTGTTTTTTCATCAGGAAGCTCAATCATATCCAACTTAGTAAATCGTCCCAATCGT\n-TTACTATATTCTGCAATACCTTCTTTGAGGTACTTTTCTTTCAATTTTCCAACGGTAATC\n-AATTTTATTTTCATAAAATAATTGTAACATATCCACAAGCATACGACAGAAAATATTTTT\n-AGAAAATCAGGATATGGCTACAGTTTTTCACATAATTCACAGAGTTATCCACAGGTTGTG\n-GATTGATTTTTGAAAACTTTAAGTTATAATTAAGAAAGAAATAGTACTCTTAAGGAAAAT\n-TAAAGAAATGGAAAGGATTCCTTATATGAAAAAATATTTGAAATTTGCGATTTTATTTGT\n-AATTGGATTTTTTGGGGGTCTTATCGGGGCCTTGTCAGCCTCTTTCTTCCAGCCACAGGT\n-GCAACAAGCAAATTCTGCTATCACTAGTGTCAGCAATGTTCAATATAATAATGAAACTTC\n-CACCACAAAAGCTGTAGAGAAAGTACAAAATGCTGTTGTGTCTGTTATTAATTACCAAAA\n-ATCAGCCAACAATAGTCTTGGTGTTATCTTTGGAAATATTGAATCATCTGACGAACTAGC\n-TGTTGCTGGAGAGGGGTCTGGGGTTATCTATAAAAAATATGGTCAATATGCCTATATTGT\n-GACAAATACGCATGTTATTAATAACGCAGAAAAGATTGATATCCTTTTAGCATCTGGAGA\n-AAAAATTAGCGGTGAACTTGTTGGTTCCGATACATATTCTGATATAGCTGTTATAAAAAT\n-ATCAGCAGATAAAGTCACTGCTGTTGCTGAATTTGCTGATTCCGATACAATTAAAGTTGG\n-AGAAACTGCTATCGCAATTGGTAGTCCTCTAGGTAGCGTCTACGCCAATACAGTTACCCA\n-GGGTATTATTTCTAGCTTAAGTCGGACAGTTACTTCACAATCAAAAGATGGACAAACAAT\n-CTCAACTAACGCTATTCAAACTGATACAGCTATCAACCCTGGAAACTCTGGCGGACCGTT\n-AATCAATACCCAAGGACAAGTGATAGGCATTACCTCTAGCAAAATTACCTCAAGTTCTGC\n-AAATAGCTCAGGCGTGGCTGTAGAAGGGTTGGGATTTGCTATTCCTGCAAATGATGCCGT\n-AGCTATTATCAATCAGCTTGAAAAAACTGGACAAGTTAGCCGACCTGCTCTTGGAGTTCA\n-TATGGTTAACTTGACGACCTTGTCAACTAGTCAATTAGAAAAAGCTGGATTATCAAATAC\n-GGAATTAACATCCGGTGTAGTAATTGTCTCTACACAAAGTGGGCTACCTGCAGATGGAAA\n-ATTAGAAACTTTTGATGTTATTACTGAGATTGACGGAGAAGCTATTCAAAATAAGAGTGA\n-CCTCCAGAGCGCTCTCTACAAACATCAAATTGGAGATACAATCACTGTAACTTATTACCG\n-CAATAATCAGAAACAAACTGTTGACATTAAGTTGACACATTCTACAGAAGAACTTAGCGA\n-ATAATTGACAAATGAGACTTTACACAATTGTAAAGTCTCATTTTTTTTGCTAGAATAAGG\n-ATATATGGAAGAATTACGTACACTAAATATTTCAGAAATCCATCCCAATCCCTATCAGCC\n-AAGAATTCATTTTGATGAAAAGGAGCTACTTGAGCTCGCTCAATCTATTAAGGAAAATGG\n-CTTAATTCAACCGATTATTGTAAGAAAATCTTCTATTATCGGATACGAATTATTAGCTGG\n-AGAAAGAAGGTTGCGAGCCAGTCAATTAGCTGGACTGACTACAATACCAGCAGTGGTAAA\n-AGAACTGACTGATGATGATTTACTCTATCAGGCTATCATAGAGAATCTGCAGCGTTCTAA\n-CTTAAATCCGATAGAAGAAGCAGCCTCTTATCAAAAATTGATTAGTAGAGGGTTAACACA\n-TGATGAAGTTGCTCAAATCATGGGAAAATCAAGACCATATATCAGTAATTTATTGCGCCT\n-ACTAAATCTATCATCTCAGACTAAACAAGCTGTAGAAGAAGGAAAAATTTCACAAGGGCA\n-CGCGCGACAATTGGTGTCATTTTCAGAAGAAAAGCAAGCCGAATGGGTTCAACTCATTTT\n-ATCAAAGGATTTAAGTGTGCGTACGCTTGAAAAATTAATAGCTGCAAATAAGAAAAAACA\n-CACTAAGCTTAAACAACGCGACCAATTTTTAAAAGAACAGGAAGATTCACTCAGTAAAAC\n-TCTTGGAACAGCTACAAAAATTATCAAGAAGAAAAACGGGAGCGGAGAAATTCGGATTAG\n-CTTTAATGACCTCGATGAATTCGAAAGAATTATCAACAATTTTAAATAGACTTGTTTACA\n-ATTTATTTTTATAAACACTCTTTTCCACACTAAAATCATTACAAAAAGTCAGGACCAGCA\n-AGGGTTCTGACTTTTATTCACATCTTGTGGAAAACTTTTCTTAACAGTGTGGATTTTAAA\n-AATTATCTGTGGAAAACTTTTGTTTTTTATGGTACACTATTCTAACGAATATAATGTGAA\n-AGGGGGAAAAT\n'
b
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.Suis_ORF.nuc.fasta
--- a/test-data/get_orf_input.Suis_ORF.nuc.fasta Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
b'@@ -1,41831 +0,0 @@\n->Streptococcus_suis|ORF1 length 457 aa, 1374 bp, from 1..1374 of Streptococcus_suis\n-ATGAACCAAGAACAACTTTTTTGGCAACGATTTATTGAATTGGCAAAGGTAAATTTTAAG\n-CCATCTATTTATGATTTTTATGTCGCTGATGCAAAATTACTCGGAATCAACCAGCAAGTT\n-GCCAATATTTTCTTAAATCGTCCATTTAAAAAAGATTTCTGGGAAAAAAACTTCGAAGAG\n-TTAATGATTGCCGCTAGTTTTGAAAGCTACGGAGAGCCTCTTACCATCCAATATCAATTT\n-ACAGAGGATGAACAGGAGATTAGGAATACTACAAACACAAGAAGTTCAATAGTTCACCAG\n-GTACAGACACTTGAGCCGGCTACTCCTCAAGAAACTTTTAAACCGGTTCATTCTGATATA\n-AAATCCCAGTACACCTTTGCTAATTTTGTACAAGGAGACAATAATCACTGGGCAAAGGCT\n-GCAGCTTTAGCTGTATCTGATAACCTAGGTGAGCTCTACAATCCATTATTCATTTTTGGT\n-GGTCCTGGTCTTGGAAAAACTCATATTTTAAATGCGATTGGAAATAAGGTTCTAGCCGAT\n-AATCCCCAGGCAAGGATAAAATATGTCTCATCGGAAACATTCATCAATGAATTTTTAGAA\n-CACCTCCGTCTCAATGATATGGAAAGTTTCAAAAAAACCTATCGCAATCTGGACTTACTT\n-CTAATTGATGACATTCAGTCTCTCCGTAATAAAGCAACAACACAGGAAGAATTTTTCCAT\n-ACTTTTAATGCGCTTCATGAAAAAAATAAGCAGATTGTACTCACAAGCGACCGTAATCCC\n-GATCACTTAGACAATTTGGAAGAAAGACTAGTAACACGTTTCAAATGGGGGTTAACCAGT\n-GAAATCACTCCACCTGATTTTGAAACACGTATCGCAATTTTACGTAACAAGTGCGAGAAC\n-CTGCCTTACAACTTTACAAATGAGACGCTATCCTATCTAGCTGGGCAATTTGATTCGAAC\n-GTACGTGACCTTGAAGGTGCCTTAAAAGATATCCATTTGATAGCCACTATGCGTCAACTG\n-TCTGAGATAAGTGTCGAGGTTGCTGCTGAGGCTATTCGATCAAGAAAACAAACAAATCCA\n-CAAAACATGGTTATTCCTATTGAGAAAATCCAAACCGAAGTGGGAAATTTCTACGGTGTC\n-AGCTTGAAAGAATTAAAAGGTTCTAAGCGTGTTCAACATATCGTTCACGCGCGACAAGTT\n-GCTATGTTTTTAGCACGTGAAATGACAGACAATTCCCTTCCAAAAATTGGGAAAGAATTT\n-GGTAATCGAGACCATACAACCGTTATGCATGCATACAATAAAATAAAAACTCTCCTCTTG\n-GATGATGAGAATTTAGAAATAGAGATTACCAGTATAAAAAATAAACTTCGTTAA\n->Streptococcus_suis|ORF2 length 385 aa, 1158 bp, from 1507..2664 of Streptococcus_suis\n-ATAATAAATAAAGGAGAATCCATGATTCAATTTTCTATTAATAAAAATATATTTCTACAA\n-GCACTTAGTATTACTAAACGGGCAATCAGTACAAAAAATGCTATTCCAATTCTTTCAACA\n-GTAAAAATTACAGTAACTAGTGAAGGAATCACTTTAACTGGTTCAAATGGACAAATCTCG\n-ATAGAACATTTTATTTCTATTCAAGATGAAAATGCAGGGCTTTTGATCAGTTCTCCAGGT\n-TCCATTCTCTTAGAAGCTGGTTTCTTTATTAATGTCGTATCCAGTATGCCGGATTTGGTC\n-CTTGACTTCAATGAAATTGAACAAAAGCAAATCGTTTTGACAAGTGGTAAGTCTGAAATC\n-ACATTAAAGGGAAAAGAAGCAGAACAGTATCCTCGTTTACAGGAAGTTCCAACTTCAAAA\n-CCATTGGTGTTAGAAACCAAAGTATTAAAACAAACAATTAATGAAACAGCATTTGCAGCT\n-TCTACACAAGAAAGTCGTCCTATTCTTACGGGTGTTCATTTTGTTTTAACAGAAAATAAA\n-AATCTAAAAACTGTTGCAACAGATTCACACCGTATGAGCCAACGGAAATTGGTCCTTGAT\n-ACCTCTGGTGATGATTTTAATGTTGTCATTCCAAGTCGTTCTCTCCGTGAATTTACTGCA\n-GTTTTTACAGATGATATTGAAACAGTAGAAGTCTTCTTTTCAAATAATCAAATCCTTTTT\n-AGAAGCGAGCATATTAGCTTCTATACACGCTTATTAGAAGGTACCTACCCTGATACCGAC\n-CGCTTAATTCCAACTGAGTTTAAAACAACTGCAATTTTTGATACTGCAAATCTTCGTCAC\n-TCGATGGAGCGTGCTCGTCTTCTTTCAAATGCAACCCAAAATGGTACAGTAAAACTAGAA\n-ATTGCTAATAATGTTGTATCGGCTCATGTAAATTCTCCAGAAGTTGGACGTGTGAATGAG\n-GAATTAGATACTGTAGAAGTATCAGGTGAAGATTTAGTAATCAGCTTTAACCCAACTTAC\n-TTGATAGAAGCATTGAAAGCCACAACTAGTGAACAAGTGAAAATTAGCTTTATCTCTTCT\n-GTCCGTCCATTTACATTGATTCCAAATAATGAAGGGGAAGATTTTATTCAATTGGTTACA\n-CCAGTTCGTACCAACTAA\n->Streptococcus_suis|ORF3 length 104 aa, 315 bp, from complement(1707..2021) of Streptococcus_suis\n-ACACCCGTAAGAATAGGACGACTTTCTTGTGTAGAAGCTGCAAATGCTGTTTCATTAATT\n-GTTTGTTTTAATACTTTGGTTTCTAACACCAATGGTTTTGAAGTTGGAACTTCCTGTAAA\n-CGAGGATACTGTTCTGCTTCTTTTCCCTTTAATGTGATTTCAGACTTACCACTTGTCAAA\n-ACGATTTGCTTTTGTTCAATTTCATTGAAGTCAAGGACCAAATCCGGCATACTGGATACG\n-ACATTAATAAAGAAACCAGCTTCTAAGAGAATGGAACCTGGAGAACTGATCAAAAGCCCT\n-GCATTTTCATCTTGA\n->Streptococcus_suis|ORF4 length 293 aa, 882 bp, from 2756..3637 of Streptococcus_suis\n-ATGACGTTATATATATTAGCTAATCCTAATGCTGGTAGCCATACTGCTGAACATATCATA\n-TTCAAAATAAAAGAAAGTTATCCACAGCTTGCAGTTAACATTTTTATGACAGTTGGTCCT\n-GAGGATGAAAAAAGTCAAATAGAGGCTATTTTAAAGGAGTTTGTCAGTAGTGAAGATCAA\n-TTAATGATTTTAGGCGGAGACGGCACACTATCTAAAGCTTTGCGTTTTTGGCCAGCTAGT\n-CTACCGTTTGCTTATTATCCAACAGGATCTGGAAATGATTTTGCTAAGGCAATGAATATA\n-ACATCGCTATATAGAAGTGTAGATGCCATTTTAGAGAGAAAAACAAGTCGGATATATGTT\n-TTAAACAGTTCATACGGAACGGTTGTAAACAGTATGGATTTTGGCTTTGCAGCTCAAGTT\n-ATCAATGGTTCAACGAATTCAATTTTGAAAAAAATTCTGAACAAGGTAAAACTTGGGAAG\n-TTAACTTATCTATTCTTTGGTATTAAAACATTATTTTCAAAACAAGCTATAAACTTAGAA\n-TTAACTCTTGATGAAAAATCTTATCAGTTAGATAATCTCTTTTTTATTTCTGTAGCAAAT\n-AGTCTTTATTTTGGTGGAGGAATCATGATATGGCCAACAGCAAGTGCTAAAAAG'..b'GCAACCATTGATGGTAAACCTATCAAAATCCAAAAAGCGCAAGATGGT\n-TTTATGAAAGTGGATGTAAGTCCAGGTCAAACTAAACTAGTTTTAACCTTTGTACCAAAT\n-GGTTTCTATCTAGGTTTACTGATTTCTTTTGGTGCAGTTTTTGTATTTTTCTCCTATCAA\n-TTCATTGGATACTATTATTCTAAGAACCGAGAATACTAA\n->Streptococcus_suis|ORF2907 length 235 aa, 708 bp, from complement(2003907..2004614) of Streptococcus_suis\n-TTTCACGTGAAACAAGGAGTGAAAATGAATCAAAAAGAGTATCGTGTTTTTGAGGGATTG\n-AGAATTGCTTGTTCATTAACGTTTATCAGTGGTTATTTAAATGCCTTTACTTTTGTGACT\n-CAGGGTGGTCGCTTTGCTGGCGTACAATCTGGAAATGTTATTTCCCTAGCTTATTTTTTA\n-GCTAAAGGTGATTTTGCGCAGGTAGTTAATTTTTCCATTCCCATTTTATTTTTTGTATTC\n-GGACAATTTTTTACCTACTTAGCAAGAAGGTATTTTGAAAAACAAACATGGTCTTGGCAC\n-TTTGGTAGTAGTGTAATGATGTTAGTTCTTATTTTACTAACTATCATTCTCTCACCTATA\n-ATGCCTGCGTCTTTTACAATTGCTAGTCTAGCCTTCGTAGCCTCTATTCAAGTAGAAACA\n-TTTAGAAGGTTACGAGGTGCTCCGTATGCCAATGTGATGATGACAGGGAATGTCAAAAAT\n-GCTGCTTATCTCTGGTTTAAAGGAGTTATTGAAAAAGATTCAGAACTTAGAAAAACAGGT\n-AGAAACATCTTATTGACCATTATAGGGTTTATGCTAGGTGTCATCATATCTACTCACCTA\n-TCCTTCCAATTTGAAGAATATGCCCTTATTGGTCTGATTTTGCCAGTGTTATATATTAAT\n-TATGAATTATGGCAAGAAAAAAGACCTACTCGAGGTAGGTCTAAATGA\n->Streptococcus_suis|ORF2908 length 180 aa, 543 bp, from complement(2004615..2005157) of Streptococcus_suis\n-CCATATCCTGATTTTCTAAAAATATTTTCTGTCGTATGCTTGTGGATATGTTACAATTAT\n-TTTATGAAAATAAAATTGATTACCGTTGGAAAATTGAAAGAAAAGTACCTCAAAGAAGGT\n-ATTGCAGAATATAGTAAACGATTGGGACGATTTACTAAGTTGGATATGATTGAGCTTCCT\n-GATGAAAAAACACCAGATAAAGCCAGTCAGGCAGAGAATGAACAAATATTAAAAAAAGAA\n-GCCGATAGAATTATGTCTAAAATTGGAGAGCGAGATTTTGTCATTGCCTTAGCGATAGAA\n-GGGAAACAATTTCCATCGGAAGAATTTAGTCAAAGGATATCTGACATTGCAGTAAATGGG\n-TATTCAGATATAACTTTTATCATCGGTGGTAGTTTGGGTCTCGATTCTTGTATTAAAAAA\n-AGAGCTAATTTGTTGATGAGTTTTGGACAGTTGACACTTCCCCATCAACTAATGAAATTA\n-GTTCTCATCGAGCAGATTTATCGTGCATTTATGATTCAGCAGGGAAGCCCATATCATAAG\n-TAG\n->Streptococcus_suis|ORF2909 length 413 aa, 1242 bp, from 2005223..2006464 of Streptococcus_suis\n-GTTATAATTAAGAAAGAAATAGTACTCTTAAGGAAAATTAAAGAAATGGAAAGGATTCCT\n-TATATGAAAAAATATTTGAAATTTGCGATTTTATTTGTAATTGGATTTTTTGGGGGTCTT\n-ATCGGGGCCTTGTCAGCCTCTTTCTTCCAGCCACAGGTGCAACAAGCAAATTCTGCTATC\n-ACTAGTGTCAGCAATGTTCAATATAATAATGAAACTTCCACCACAAAAGCTGTAGAGAAA\n-GTACAAAATGCTGTTGTGTCTGTTATTAATTACCAAAAATCAGCCAACAATAGTCTTGGT\n-GTTATCTTTGGAAATATTGAATCATCTGACGAACTAGCTGTTGCTGGAGAGGGGTCTGGG\n-GTTATCTATAAAAAATATGGTCAATATGCCTATATTGTGACAAATACGCATGTTATTAAT\n-AACGCAGAAAAGATTGATATCCTTTTAGCATCTGGAGAAAAAATTAGCGGTGAACTTGTT\n-GGTTCCGATACATATTCTGATATAGCTGTTATAAAAATATCAGCAGATAAAGTCACTGCT\n-GTTGCTGAATTTGCTGATTCCGATACAATTAAAGTTGGAGAAACTGCTATCGCAATTGGT\n-AGTCCTCTAGGTAGCGTCTACGCCAATACAGTTACCCAGGGTATTATTTCTAGCTTAAGT\n-CGGACAGTTACTTCACAATCAAAAGATGGACAAACAATCTCAACTAACGCTATTCAAACT\n-GATACAGCTATCAACCCTGGAAACTCTGGCGGACCGTTAATCAATACCCAAGGACAAGTG\n-ATAGGCATTACCTCTAGCAAAATTACCTCAAGTTCTGCAAATAGCTCAGGCGTGGCTGTA\n-GAAGGGTTGGGATTTGCTATTCCTGCAAATGATGCCGTAGCTATTATCAATCAGCTTGAA\n-AAAACTGGACAAGTTAGCCGACCTGCTCTTGGAGTTCATATGGTTAACTTGACGACCTTG\n-TCAACTAGTCAATTAGAAAAAGCTGGATTATCAAATACGGAATTAACATCCGGTGTAGTA\n-ATTGTCTCTACACAAAGTGGGCTACCTGCAGATGGAAAATTAGAAACTTTTGATGTTATT\n-ACTGAGATTGACGGAGAAGCTATTCAAAATAAGAGTGACCTCCAGAGCGCTCTCTACAAA\n-CATCAAATTGGAGATACAATCACTGTAACTTATTACCGCAATAATCAGAAACAAACTGTT\n-GACATTAAGTTGACACATTCTACAGAAGAACTTAGCGAATAA\n->Streptococcus_suis|ORF2910 length 256 aa, 771 bp, from 2006519..2007289 of Streptococcus_suis\n-GGATATATGGAAGAATTACGTACACTAAATATTTCAGAAATCCATCCCAATCCCTATCAG\n-CCAAGAATTCATTTTGATGAAAAGGAGCTACTTGAGCTCGCTCAATCTATTAAGGAAAAT\n-GGCTTAATTCAACCGATTATTGTAAGAAAATCTTCTATTATCGGATACGAATTATTAGCT\n-GGAGAAAGAAGGTTGCGAGCCAGTCAATTAGCTGGACTGACTACAATACCAGCAGTGGTA\n-AAAGAACTGACTGATGATGATTTACTCTATCAGGCTATCATAGAGAATCTGCAGCGTTCT\n-AACTTAAATCCGATAGAAGAAGCAGCCTCTTATCAAAAATTGATTAGTAGAGGGTTAACA\n-CATGATGAAGTTGCTCAAATCATGGGAAAATCAAGACCATATATCAGTAATTTATTGCGC\n-CTACTAAATCTATCATCTCAGACTAAACAAGCTGTAGAAGAAGGAAAAATTTCACAAGGG\n-CACGCGCGACAATTGGTGTCATTTTCAGAAGAAAAGCAAGCCGAATGGGTTCAACTCATT\n-TTATCAAAGGATTTAAGTGTGCGTACGCTTGAAAAATTAATAGCTGCAAATAAGAAAAAA\n-CACACTAAGCTTAAACAACGCGACCAATTTTTAAAAGAACAGGAAGATTCACTCAGTAAA\n-ACTCTTGGAACAGCTACAAAAATTATCAAGAAGAAAAACGGGAGCGGAGAAATTCGGATT\n-AGCTTTAATGACCTCGATGAATTCGAAAGAATTATCAACAATTTTAAATAG\n'
b
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.Suis_ORF.prot.fasta
--- a/test-data/get_orf_input.Suis_ORF.prot.fasta Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
b'@@ -1,16670 +0,0 @@\n->Streptococcus_suis|ORF1 length 457 aa, 1374 bp, from 1..1374 of Streptococcus_suis\n-MNQEQLFWQRFIELAKVNFKPSIYDFYVADAKLLGINQQVANIFLNRPFKKDFWEKNFEE\n-LMIAASFESYGEPLTIQYQFTEDEQEIRNTTNTRSSIVHQVQTLEPATPQETFKPVHSDI\n-KSQYTFANFVQGDNNHWAKAAALAVSDNLGELYNPLFIFGGPGLGKTHILNAIGNKVLAD\n-NPQARIKYVSSETFINEFLEHLRLNDMESFKKTYRNLDLLLIDDIQSLRNKATTQEEFFH\n-TFNALHEKNKQIVLTSDRNPDHLDNLEERLVTRFKWGLTSEITPPDFETRIAILRNKCEN\n-LPYNFTNETLSYLAGQFDSNVRDLEGALKDIHLIATMRQLSEISVEVAAEAIRSRKQTNP\n-QNMVIPIEKIQTEVGNFYGVSLKELKGSKRVQHIVHARQVAMFLAREMTDNSLPKIGKEF\n-GNRDHTTVMHAYNKIKTLLLDDENLEIEITSIKNKLR\n->Streptococcus_suis|ORF2 length 385 aa, 1158 bp, from 1507..2664 of Streptococcus_suis\n-IINKGESMIQFSINKNIFLQALSITKRAISTKNAIPILSTVKITVTSEGITLTGSNGQIS\n-IEHFISIQDENAGLLISSPGSILLEAGFFINVVSSMPDLVLDFNEIEQKQIVLTSGKSEI\n-TLKGKEAEQYPRLQEVPTSKPLVLETKVLKQTINETAFAASTQESRPILTGVHFVLTENK\n-NLKTVATDSHRMSQRKLVLDTSGDDFNVVIPSRSLREFTAVFTDDIETVEVFFSNNQILF\n-RSEHISFYTRLLEGTYPDTDRLIPTEFKTTAIFDTANLRHSMERARLLSNATQNGTVKLE\n-IANNVVSAHVNSPEVGRVNEELDTVEVSGEDLVISFNPTYLIEALKATTSEQVKISFISS\n-VRPFTLIPNNEGEDFIQLVTPVRTN\n->Streptococcus_suis|ORF3 length 104 aa, 315 bp, from complement(1707..2021) of Streptococcus_suis\n-TPVRIGRLSCVEAANAVSLIVCFNTLVSNTNGFEVGTSCKRGYCSASFPFNVISDLPLVK\n-TICFCSISLKSRTKSGILDTTLIKKPASKRMEPGELIKSPAFSS\n->Streptococcus_suis|ORF4 length 293 aa, 882 bp, from 2756..3637 of Streptococcus_suis\n-MTLYILANPNAGSHTAEHIIFKIKESYPQLAVNIFMTVGPEDEKSQIEAILKEFVSSEDQ\n-LMILGGDGTLSKALRFWPASLPFAYYPTGSGNDFAKAMNITSLYRSVDAILERKTSRIYV\n-LNSSYGTVVNSMDFGFAAQVINGSTNSILKKILNKVKLGKLTYLFFGIKTLFSKQAINLE\n-LTLDEKSYQLDNLFFISVANSLYFGGGIMIWPTASAKKKEVDIVYFKNGNFYQRLQSLLA\n-LLTKRHESSHTIQHLTGVDVVLKSKEKLLLQIDGETCTANEVTLTYQERSMYL\n->Streptococcus_suis|ORF5 length 126 aa, 381 bp, from 3933..4313 of Streptococcus_suis\n-KKEEEMIMKQLAQQIRVLRTAKNLSQDELAEKLYISRQAVSKWENGEATPDIDKLVQLAE\n-IFGVSLDYLVLGKEPEKEIVVEQRGKMNGWEFLNEESKRPLTRGDVVLLIFLAVMLLGGL\n-FIKHYF\n->Streptococcus_suis|ORF6 length 377 aa, 1134 bp, from 4381..5514 of Streptococcus_suis\n-LESKKNMSLTAGIVGLPNVGKSTLFNAITKAGAEAANYPFATIDPNVGMVEVPDERLQKL\n-TELIIPKKTVPTTFEFTDIAGIVKGASKGEGLGNKFLANIREVDAIVHVVRAFDDENVMR\n-EQGREDAFVDPIADIDTINLELILADLESINKRYARVEKMARTQKDKDSVAEFAVLEKIK\n-PVLEDGKSARTVEFTDEEQKIVKQLFLLTTKPVLYVANVDEDKVADPEAISYVQQIRDFA\n-ATENAEVVVISARAEEEISELDDEDKGEFLEALGLTESGVDKLTRAAYHLLGLGTYFTAG\n-EKEVRAWTFKRGMKAPQCAGIIHSDFEKGFIRAVTMSYDDLMTYGSEKAVKEAGRLREEG\n-KEYVVQDGDIMEFRFNV\n->Streptococcus_suis|ORF7 length 115 aa, 348 bp, from complement(4450..4797) of Streptococcus_suis\n-VNGINISDWIHKGIFTALFTHDIFIVKGTHNVDNRINFADIGQEFISKSFTFRSTFYDTS\n-NISKFKSRWHCLFRDDEFGQLLQTLIGHFYHADVWINSCERVVCSFCSCLGNCVK\n->Streptococcus_suis|ORF8 length 115 aa, 348 bp, from complement(4491..4838) of Streptococcus_suis\n-RLLMLSRSAKINSRLMVSISAIGSTKASSRPCSRMTFSSSKARTTWTIASTSRILAKNLF\n-PSPSPLEAPFTIPAISVNSKVVGTVFLGMMSSVNFCRRSSGTSTMPTFGSIVAKG\n->Streptococcus_suis|ORF9 length 192 aa, 579 bp, from 5663..6241 of Streptococcus_suis\n-GEKMTRLIIGLGNPGDRYFETKHNVGFMLLDKIAKRENVTFNHDKIFQADIATTFIDGEK\n-IYLVKPTTFMNESGKAVHALMTYYGLDATDILVAYDDLDMAVGKIRFRQKGSAGGHNGIK\n-SIVKHIGTQEFDRIKIGIGRPKGKMSVVNHVLSGFDIEDRIEIDLALDKLDKAVNVYLEE\n-DDFDTVMRKFNG\n->Streptococcus_suis|ORF10 length 1166 aa, 3501 bp, from 6235..9735 of Streptococcus_suis\n-RIMNILDLLHKNKQINQWQSGLNQSTRQLLLGLSGTSKSLIMATAYDCLAEKIMIVTATQ\n-NDAEKLVADLTAIIGSENVYNFFTDDSPIAEFVFASKERTQSRIDSLNFLTDSTSSGILV\n-ASIVACRVLLPSPETYKGSKIQLEVGQEIEVDKLVKNLVNIGYKKVSRVLTQGEFSQRGD\n-ILDIFDMQSETPYRIEFFGDEIDGIRIFDVDSQKSLENLDEISISPASDIILSSEDYSRA\n-SQYIQTAIEQSTLEEQQSYLREVLADMQTEYRHPDLRKFLSCIYEQSWTLLDYLPKSSPL\n-FLDDFHKIADKQAQFEKEIADLLTDDLQKGKTVSSLKYFASTYAELRKYKPATFFSSFQK\n-GLGNVKFDALYQFTQHPMQEFFHQIPLLKDELTRYAKSNNTVVIQASSDVSLQTLQKNLQ\n-EYDIHLPVHAADKLVEGQQQVTIGQLASGFHLMDEKLVFITEKEIFNKKMKRKTRRTNIS\n-NAERIKDYSELAVGDYVVHHVHGIGQYLGIETIEISGIHRDYLTVQYQNSDRISIPVEQI\n-DLLSKYLASDGKAPKVNKLNDGRFQRTKQKVQKQVEDIADDLIKLYAERSQLKGFAFSPD\n-DENQVEFDNYFTHVETDDQLRSIDEIKKDMEKDSPMDRLLVGDVGFGKTEVAMRAAFKAV\n-NDGKQVAILVPTTVLAQQHYANFQERFAEFPVNVDVMSRFKTKAEQEKTLEKLKKGQVDI\n-LIGTHRLLSKDVVFADLGLLVIDEEQRFGVKHKERLKELKKKIDVLTLTATPIPRTLQMS\n-MLGIRDLSVIETPPTNRYP'..b'\n-DTDTVMYSIIALMTITYIVNRMMSGTQSSRNVMIISQKSEEIKDYITKVADRGVTELPII\n-GGFTGVDKRMLMTTISIPEMQKLETAVLEIDETAFMVVMPASQVRGRGFSLQKDHKHYDE\n-DILIPM\n->Streptococcus_suis|ORF2902 length 565 aa, 1698 bp, from 1998923..2000620 of Streptococcus_suis\n-FQCNSLKIQVLSSTIKLIDRNRGETMLTVSDVSLRFSDRKLFDDVNIKFTAGNTYGLIGA\n-NGAGKSTFLKILAGDIEPSTGHISLGPDERLSVLRQNHFDYEDERVIDVVIMGNEQLYSI\n-MKEKDAIYMKEDFSDEDGVRAAELEGEFAELGGWEAESEASQLLQNLNISEDLHYQNMSE\n-LTNGEKVKVLLAKALFGKPDVLLLDEPTNGLDIQSINWLEDFLIDFENTVIVVSHDRHFL\n-NKVCTHMADLDFGKIKIFVGNYDFWKQSSELAAKLQADRNAKAEEKIKELQEFVARFSAN\n-ASKSKQATSRKKMLDKIELEEIIPSSRKYPFINFKSEREIGNDLLTVENLKVVIDGETIL\n-DNISFILRPGDKTALIGQNDIQTTALIRALMGDIEYEGTVKWGVTTSQSYLPKDNTRDFD\n-TNESILDWLRQFASKEEDDNTFLRGFLGRMLFSGDEVNKPVNVLSGGEKVRVMLSKLMLL\n-KSNVLVLDDPTNHLDLESISSLNDGLKAFKESIIFASHDHEFIQTLANHIIVISKNGVID\n-RIDETYDEFLENAEVQAKVQELWKA\n->Streptococcus_suis|ORF2903 length 115 aa, 348 bp, from complement(1999705..2000052) of Streptococcus_suis\n-PIRAVLSPGRRIKLILSRIVSPSITTFKFSTVKRSLPISRSDLKLINGYLRLEGMISSNS\n-ILSNIFLREVACLDLEALAEKRATNSCSSLIFSSAFALRSACSLAASSLDCFQKS\n->Streptococcus_suis|ORF2904 length 110 aa, 333 bp, from 1999974..2000306 of Streptococcus_suis\n-KLLLMVKRFLTISALSCAQVTRLLLLVKTTSKQLLSFVLLWAILNMKVLSSGVSLLVNPT\n-YQKTILVTLIQTNLSLIGSVNLPARKKMTIPSCAVSWDVCSSRVMRLTNL\n->Streptococcus_suis|ORF2905 length 117 aa, 354 bp, from 2000502..2000855 of Streptococcus_suis\n-QTISSSFLKTVLSTESTKLMMNSWKMLKYKQKYKNFGKHNKKRLGLLPSLSSQSSCQHLS\n-AVVDCQICSCFTLQIWPLRLLRTKFALSPTSNCLPDSLSCAGVGVKQSGNRLFQLNN\n->Streptococcus_suis|ORF2906 length 872 aa, 2619 bp, from 2000888..2003506 of Streptococcus_suis\n-PVKFFPTSFSFKSMKKIFTKTSIYYLLSFLIPLTIISIVLAFQGIWWGSDTTILASDGFH\n-QYVIFNQTLRNTLHGDGSLFYTFSSGLGLNFYALSSYYLGSFLSPIVFFFDLQSMPDAIY\n-LVTIVKFGLTGLSTYFSLKGIHKNLKEEWALLLATSFSLMSFSTSQLEINNWLDVFILLP\n-LVLLGLHRLLKKQGPILYYITLTCLFIQNYYFGYMVAIFLTLWTLVQLSWIDSQRIKRFI\n-NFTIVSILSALSSMFMLLPTYLDLKTHGETFTKIVNLKTEDSWYLDFFAKNLVGSFDTTK\n-FGSIPMISVGLVPLILALLFFTLKEIKPTVKLSYALFFTFIISSFYLQPLNLFWQGMHAP\n-NMFLYRYAWALSITVIYLAAETLVRLRQVSIKNFTLIVSFLLICFTSTFIFRDHYEFLTD\n-VNFLLTLEFLIAYFILFVAMIRYKSSLKWINIVLLFFTFLELGLHSHYQVQGISDEWHFP\n-SRSNYEEKLTDIDSIVKSTKTTTDSFYRIERLLPQTGNDSMKFNYNGISQFSSIRNRASS\n-SVLDKLGFRSDGTNLNLRYQNNTIIADSLFGVKYNLATTDPNKFGFTLNQSQSTINLYEN\n-SFNLGLALLTEGIYKDVNFTNLTLDNQTNFLNQLTGLSQKYYHTLSDVVSQNTVELSNRM\n-TVNKVDNEDAAKATFLVNIPANSQVYLNLPNLTFSNENQKKVVITVNNQSSEFTLDNAFS\n-FFNVGSFTTDVQVQVNVYFPENNQVSFDKPQFYRLDLLAFQQAISILQEKQVVTKTDGNK\n-VTVDFVTDKESSLLLTLPYDKGWNATIDGKPIKIQKAQDGFMKVDVSPGQTKLVLTFVPN\n-GFYLGLLISFGAVFVFFSYQFIGYYYSKNREY\n->Streptococcus_suis|ORF2907 length 235 aa, 708 bp, from complement(2003907..2004614) of Streptococcus_suis\n-FHVKQGVKMNQKEYRVFEGLRIACSLTFISGYLNAFTFVTQGGRFAGVQSGNVISLAYFL\n-AKGDFAQVVNFSIPILFFVFGQFFTYLARRYFEKQTWSWHFGSSVMMLVLILLTIILSPI\n-MPASFTIASLAFVASIQVETFRRLRGAPYANVMMTGNVKNAAYLWFKGVIEKDSELRKTG\n-RNILLTIIGFMLGVIISTHLSFQFEEYALIGLILPVLYINYELWQEKRPTRGRSK\n->Streptococcus_suis|ORF2908 length 180 aa, 543 bp, from complement(2004615..2005157) of Streptococcus_suis\n-PYPDFLKIFSVVCLWICYNYFMKIKLITVGKLKEKYLKEGIAEYSKRLGRFTKLDMIELP\n-DEKTPDKASQAENEQILKKEADRIMSKIGERDFVIALAIEGKQFPSEEFSQRISDIAVNG\n-YSDITFIIGGSLGLDSCIKKRANLLMSFGQLTLPHQLMKLVLIEQIYRAFMIQQGSPYHK\n->Streptococcus_suis|ORF2909 length 413 aa, 1242 bp, from 2005223..2006464 of Streptococcus_suis\n-VIIKKEIVLLRKIKEMERIPYMKKYLKFAILFVIGFFGGLIGALSASFFQPQVQQANSAI\n-TSVSNVQYNNETSTTKAVEKVQNAVVSVINYQKSANNSLGVIFGNIESSDELAVAGEGSG\n-VIYKKYGQYAYIVTNTHVINNAEKIDILLASGEKISGELVGSDTYSDIAVIKISADKVTA\n-VAEFADSDTIKVGETAIAIGSPLGSVYANTVTQGIISSLSRTVTSQSKDGQTISTNAIQT\n-DTAINPGNSGGPLINTQGQVIGITSSKITSSSANSSGVAVEGLGFAIPANDAVAIINQLE\n-KTGQVSRPALGVHMVNLTTLSTSQLEKAGLSNTELTSGVVIVSTQSGLPADGKLETFDVI\n-TEIDGEAIQNKSDLQSALYKHQIGDTITVTYYRNNQKQTVDIKLTHSTEELSE\n->Streptococcus_suis|ORF2910 length 256 aa, 771 bp, from 2006519..2007289 of Streptococcus_suis\n-GYMEELRTLNISEIHPNPYQPRIHFDEKELLELAQSIKENGLIQPIIVRKSSIIGYELLA\n-GERRLRASQLAGLTTIPAVVKELTDDDLLYQAIIENLQRSNLNPIEEAASYQKLISRGLT\n-HDEVAQIMGKSRPYISNLLRLLNLSSQTKQAVEEGKISQGHARQLVSFSEEKQAEWVQLI\n-LSKDLSVRTLEKLIAANKKKHTKLKQRDQFLKEQEDSLSKTLGTATKIIKKKNGSGEIRI\n-SFNDLDEFERIINNFK\n'
b
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.fasta
--- a/test-data/get_orf_input.fasta Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,17 +0,0 @@
->alpha three forward CDS using table 1
-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-NNNNNNNNNNNNNNNNATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNN
-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
-NNNNNNNNNNNNNNNNNTAANNTAGMNTGANNNNNNNNNNNNNNNNNNNNN
->beta three forward CDS using table 11
-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-NNNNNNNNNNNNNNNNNGTGNATANATTNNNNNNNNNNNNNNNNNNNNNNN
-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
-NNNNNNNNNNNNNNNNNNTAANNTAGNNTGANNNNNNNNNNNNNNNNNNNN
-TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
b
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t11_nuc_out.fasta
--- a/test-data/get_orf_input.t11_nuc_out.fasta Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,36 +0,0 @@
->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
-ATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-GGGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTN
-NNNNNNNNNNNNNNNNTAANNTAG
->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
-ATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-GGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNN
-NNNNNNNNNNNNTAA
->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
-ATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNN
-NNNNNNNNTAANNTAGMNTGA
->beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11
-GTGNATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-GGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNN
-NNNNNNNNNNNNNNNNTAANNTAG
->beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11
-ATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-GGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNN
-NNNNNNNNNNNNTAA
->beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11
-ATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGT
-TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNNN
-NNNNNNNNTAANNTAGNNTGA
b
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t11_open_nuc_out.fasta
--- a/test-data/get_orf_input.t11_open_nuc_out.fasta Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,39 +0,0 @@
->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
-ATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-GGGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTN
-NNNNNNNNNNNNNNNNTAANNTAG
->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
-ATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-GGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNN
-NNNNNNNNNNNNTAA
->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
-ATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNN
-NNNNNNNNTAANNTAGMNTGA
->beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11
-GTGNATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-GGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNN
-NNNNNNNNNNNNNNNNTAANNTAG
->beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11
-ATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-GGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNN
-NNNNNNNNNNNNTAA
->beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11
-ATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGT
-TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNNN
-NNNNNNNNTAANNTAGNNTGA
->beta|CDS4 length 25 aa, 75 bp, from 334..408 of beta three forward CDS using table 11
-NTGANNNNNNNNNNNNNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
-TTTTTTTTTTTTTTT
b
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t11_open_prot_out.fasta
--- a/test-data/get_orf_input.t11_open_prot_out.fasta Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,20 +0,0 @@
->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
-MXXXXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGG
-GGGFFFFFFFFFFFFFFFFXXXXXXXX
->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
-MXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGGG
-GVFFFFFFFFFFFFFFFFXXXXXX
->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
-MXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG
-FFFFFFFFFFFFFFFFFXXXXXXXXX
->beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11
-MXXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGG
-GGVFFFFFFFFFFFFFFFFXXXXXXXX
->beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11
-MXXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGG
-GFFFFFFFFFFFFFFFFFXXXXXX
->beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11
-MXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG
-FFFFFFFFFFFFFFFFXXXXXXXXXX
->beta|CDS4 length 25 aa, 75 bp, from 334..408 of beta three forward CDS using table 11
-MXXXXXXXFFFFFFFFFFFFFFFFF
b
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t11_prot_out.fasta
--- a/test-data/get_orf_input.t11_prot_out.fasta Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,18 +0,0 @@
->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
-MXXXXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGG
-GGGFFFFFFFFFFFFFFFFXXXXXXXX
->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
-MXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGGG
-GVFFFFFFFFFFFFFFFFXXXXXX
->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
-MXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG
-FFFFFFFFFFFFFFFFFXXXXXXXXX
->beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11
-MXXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGG
-GGVFFFFFFFFFFFFFFFFXXXXXXXX
->beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11
-MXXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGG
-GFFFFFFFFFFFFFFFFFXXXXXX
->beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11
-MXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG
-FFFFFFFFFFFFFFFFXXXXXXXXXX
b
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t1_nuc_out.fasta
--- a/test-data/get_orf_input.t1_nuc_out.fasta Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,18 +0,0 @@
->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
-ATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-GGGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTN
-NNNNNNNNNNNNNNNNTAANNTAG
->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
-ATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-GGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNN
-NNNNNNNNNNNNTAA
->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
-ATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
-AAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
-CCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
-TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNN
-NNNNNNNNTAANNTAGMNTGA
b
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t1_prot_out.fasta
--- a/test-data/get_orf_input.t1_prot_out.fasta Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,9 +0,0 @@
->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
-MXXXXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGG
-GGGFFFFFFFFFFFFFFFFXXXXXXXX
->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
-MXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGGG
-GVFFFFFFFFFFFFFFFFXXXXXX
->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
-MXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG
-FFFFFFFFFFFFFFFFFXXXXXXXXX
b
diff -r 324775a016ce -r 6a14074bc810 test-data/sanger-pairs-forward.fastq
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/sanger-pairs-forward.fastq Mon Jul 29 09:28:55 2013 -0400
b
b"@@ -0,0 +1,288 @@\n+@WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+TTACCCGTCGGCGCCGAAAGAGCCGAAGGCTTTGTGACTGAGGCCGGACACTGTGCTGTTAAGCTGGACATTGCCCGACCTGTCGAGTGCGCCGCTCGCCGAAATTCGTTATCGCGTAAATTTATTTATTTATTTTTATTTTTTTAAATAAAAATGACGACTAATTTGTAAGGGCATAACAACAA\n++WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+!,,,./644,,,-0377<:Q777<BB<<60,+.,+,.4.,))))//15>>550007:66>>==7@71/--0:<CDBB;;49/***/***22,/+)))11===798:3.,,1488?133??BKKMODFB?BDB7447B?:8--.E:F?B77?BKKC<<322B:..<41,46>>B<<::::5116..\n+@WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+CGTGCCAGTTCTAAACTGGTCGTTCAGCGCCAACCGAAGTGCATACCCTGACGAGCATACACGCAGCTGAAGCGCTCCACAAGCAGCTCTCACCACTAGTCCACGCACCACCCCGCAAGGAGACGGCACGCAGCCACGGGCAAAAGCCGCCTGTTTCACACAACAGCCCGGCTGACCCGACCTTTAGAGCCAATTCTTTTCCCGAAGTTACGAATCTAATTTGCCGACTTCCCTTACCTACATTATTCTATCGACTAGAGGCTGTTCACCTTGGAGACCTGCTGCGGATATCGGTACGATCAGGCAGGAGATTCATATCGCTTCCCTCGCATTTTCAAGGGCCGTGTGGAGCGCACGAGACACCACAGGAACCGCGGTGCTTTACGGGCGCAACATCCCTATCTCAGGCTGAGCCACTTCCAGGCACGCACGCCCTAAACCAGAAAAGAGAACTCTGGCTCGGACTCCACACGACGTCTGCGAGTTCATTTGCGTTACCGCGCGAAACAGTTCTTGCGAACCGTCATTTCCCTGGCCTGGCGTGGGAATGTTAACCCACTTCCCTTTCGGCAACCGGATGGACAAACTGCGCAAGCACAGCAAAGTCTTCATCCGTAGTGTGTGACGGCATTAGCCGGTGC\n++WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+!<>AIHHCCCCCCCCIIIINNNNNTTTYYYYYYYYYYTTTTIIIIHHNIIIFDKFDDINNNTTTNIIIIINTTTTTTTYYYYYYTNNNNNTTYNIIIIIINNYYYYYYYYYYYYYYYYYTNNNNNTTTTTTYYYYYYYYYYYYYYYYYTLLJJJNNTTTTYYYYYYYYYTNNJNJLLTYYYYTONJJJOOYYYYYYYYYYYYYTTTTLOJJJJOOYYYYYYYYYTTTTTTYYYTTTTTTYYYYYYYYYYYYYYYYLJJJJJTYYYTLLLTOTJJJJJKKOYYYYTJNJJJOOTOOIIIILKYYYYTINDDDEEOSYYYYYYYYYYYYYYYYYYYYYYTTLTTTTTTTINIIIOYTKB888>>KMYYIIFIIITKYYYYKKKTOTYYYYYYYYYYYYYYYYYYYKIDDDD>>444>BKLKIIGGDIOYYYYIYYYQIIII@@7507>43--/<<IAAIIII>559==A@IIB>>===KMQM??/33?BIIQQIIFCCFCCFIIICIHA?@F>:>:>>=3...08AIIIMIQQQQCCCCQC:>=:6:>:>>IICA>>>>IFCCC>:>AA>99>;>AACAA>>>::7;7AIII>>>:>>IAI>833688949>@C>:>A;98777=;>99::>4755057132+\n+@WTSI_1055_1a09.p1kpIBF bases 1 to 497\n+CGAGCTCGGTACCCGGGGATCCCACCGTTTGGAGGGTGAATTCGCGCTGGAAAAAGGTTTTCCATGCAAAAAATGGAACTTCTTCAGCGTCCAAAGCTTTAGTCAGCCAGCAAAGTGTTGGCATTTCATCGAATGGAAATGGTTCAATAAGTAGCGGCAGCCCCAACGTTTTTGAGAAGTTTTGTGGCGTTTTCTCTGAAGGGGTAAAGTCAGGCGAATTGCTGGAAAAGGTGCCATTGGGTGATTTGGAAGTTGTTCTGTTGATGAACCTTTCATGTTCTAGGCGTTTGTGAAGGAATTTGCTGACAATTTGCTCCGAATCCAAAAGGACGTTGAGCGCTGTGATCGGACCATCAAATTCTATTCCAAACGGGACAATTTGGATGCTCTCCAACGGATAATTTGCACTTACATTTATCGTCGGCTGAAGTTGGACATTGAGGACGGTTACGTGCAGGGAATGTGCGATTTGGTCGCTCCTCTTTTGGTGGTGTTT\n++WTSI_1055_1a09.p1kpIBF bases 1 to 497\n+!989>>CCCCCCIIYOICCCCOIYHHA8339>><@75.444N@IDHHHDDNTTYYYYYTTTIIIIINYYYYYTTTTTTTNNNHHHIHHIIIIOQIDKDDDFHIIITYYYYYYYTTTTTYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTTNNNNNNTTTYYYYYYYTTTTYTYYYSSSYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOOKJJNOTTYYYYYYYTTTTTTTTTTYYYYYYYYTTTTTTYYYYYYTTNNLLLLLLYYMOKKKOYYYYYYYYYYYYYYYYYYTTTTTIIIIIITYYLIIIIIFFDDDFYYYYYYYTTTTTTYYYYYYQQMMMYYTTOKKKIIIIIIIKKNNNDDDNNNNTYTTOOKKKINNIIKQONN?N2::NHTQOKKKKFFFFFFMMIIIICBAAIII>>>>>>AAAB=?FBO>88+,+//><IIII<33/++/0<<4\n+@WTSI_1055_1a10.p1kpIBF bases 1 to 512\n+AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACGGCAAGAGACCAATCTGGTTTTGCAATGTAACATGCCAATTAATCATCAGCATTTTTCACATAAGTGATGGGATGACGGTTGGGGGGGGGGGAAATAAATGCATGTCGATCAGTGCATAGAAGCGAAAGAAATCGTAGAAATTTGCAGATGAAAATTTTGCAGTGGTAATTTGACCGTACCGAAAAGGAATGAGAGCTATTTACCTGTGGGAATGGGTGTAAAATGGAAACTAAATTGCGCGAGGGACAGTTTTGATTGGACGATATCTCCAGCGCAAAGGTCACATGACCAGCCGCTTGGAGATTGTTCGGGTAAGCGAGACAAAATACGAACAATCGGAGTTATTTGTACAACAACAACACATTGATTAAGTGATGGGAGAAAAAAAAAAGAAGGAATAATATGGCTTTGTGCATTTTTCTAAAGGTCTTAAAAATCAA\n++WTSI_1055_1a10.p1kpIBF bases 1 to 512\n+!.6<:::60.1441+21441++AAAAEHHHHHHHHHHDBB4+,+<<IDCCCCCCCCCHITIIDDDCOOQH@@//)))059><10''*45EHMOFEDCCCCCDIIINTTIINNNNTTTTTTYYYYTIIIDDDDDHHHHHNOKKKKMOOTINNNNNYYYYQPPPPKKKLOKMMMKIINIIIKIIIIIFKIIIITOYSSYYYYYTTTLOKKKKKYYYYYYKLMMOOMSSYSLOKKFBBBFKKKSSYYYSSMMSSYYYSSSSMSSSSSMYYYYMOKKKKSSYYSKKKKKKSYYYPSSSSMMFIIOJJSYYYSSSMLOLIIIIIIYYYLLTLOIIIFFFKKMYYYYYYYYYTTTTTOOKKIINNNNTYYYYYYOFFFFFFIOYYYYYYYYYYQQKKKKKMMTTTTYYYIIIFFFFFFFDMMQQYYKKKKKKMKKKQQYQOKKKMOYYYA;777;>CIIIH@>>CA=94++,69ICCCC@>>743323::@@BIMII"..b"ATCTTTCGCCACTTCCCGCCTCCCCCCCCCCTTTTGACCACCTGCCATTGTTGTCGTTGAGCAACCGAATTTGACTCTTCACCCGTCGACTGCTGGGCGTTCGCTGTTCCGCCATGAATTGGCGCCATTCTCTTTGGCCCTAAAAGTGAACCGGTTACCAACTACTAAAGTGTCCGATTCGCTCCCGAACCTGCCGAGTCTGGACAGAGGCCGGAATTTTTGGGAATGCCATCAATCCCGGAGCATTTTTGAAGCTGCTCTCGACATGAGTACCGGCTCCATTAAAATTATCCCCCTCCAAACCGACCACAATCACACGCCCCCACTCGTCCCTGCGCAACGTCGTCTCTTCGTCGTCCACCTCCGCCTCGTCCGTTCTCGCCCATTCCCTTTTCTCGTC\n++WTSI_1055_1f20.p1kpIBF bases 1 to 491\n+!89><<<536::6001:41--<A?>CCCFCDDDDIIIYQKKGGNNNDCCCCCDDDDDHTNIDDDDTTIDA>9449;>@DHHHHHINNNNHDEEFHHNNNIIIIIIYYTIIIIITYYYYTTTTNNTTTTTIIIIIIFF>>2...@NNNTTTTTYYYYYYYYYYTTTTTNNLTTNYYYTTNNNLLLTTTTTTTTTTTTTTTYYYYTTTTTTYTNNNNILLNNNNNNNTTTTTTTTTTTTTTTYYYTTLTTTTTTTTTTTTTYYYYYYYYYYYYYYYYYTTTTTTYYTTTTTTYYTTNNNNIIKYYKKTTTTYYYYTTTTTTYYYYYYYYYYKKKKKKYYYYYTTTTTTYYYYIIIIBB=>7<<>>>CII??36-1(((()*+48ACIAA?4/)))'/***,++,539<>>>>>BD777777>>>>>>>>91/))01<::8=891,*117444,+,12777.,+44>440/0977-//-10++048:30---+\n+@WTSI_1055_1f21.p1kpIBF bases 1 to 456\n+TAAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACCTGGAGCAAACTGGTTGTGTCGTGGTCAGGGTACCGCCATTCCGTGAGATATGGTAGGTAAATGCGACCGGGATTATCCACAACTTTGGACGGCCTAATTCGCATACATGGAGTCGGCTTCACATAGCAATAGGGGCCTACGTTGGGATGATTTTCCAGAAAGTAAATGGCTACGGGAATGTTGTACACAGCTCCCTTAAGCTTTATGTATTAAACAAACAAACAAAGACCATACAGCCCACCTTATACAAGATGGGAATGGTCCCCGAAAAGGAAAGGCAATATTTCGGCATTCCGTCAGGGAAAACAAAATTCACAACGTCGGGCTGAAGATCTATAAAATTGTTGAGCGCAGTGAGTAAATCATCCTTCGTACTATCCTC\n++WTSI_1055_1f21.p1kpIBF bases 1 to 456\n+!.348<<<<<4014:3.08::;<<ECCCIIIHCCBCCCDIYMMKKBNNNHDDDDDDDDDINYOIDDHHTTIDDAA<<<>BDDDDDDDDDIIHHHHIINNNIFDHHHIINIFFIINITTKFFIIIIIIIIIIIIIOOMMQQ8.))*25IHMQQQIIIIIIIIIITNNNNIIKYYYTTTTTTTTTTTTYNNIIIINNTTTTTNNIIITTTTTTTTNNNNTTYYYYYTTTTTTYYYTTTTTNTTTTTYYKKFFKKYYYYYYYYYTTTTTTTTTTTTYTTTTTTYTTTTTTTTLIFDDFJJJFFFIIJLOKFFMSSSYYYYYSSFB;??IIKKKKKKKKKLLKFFDDDDMDDDDB;789;AFNDBB;;BOMMMKKIDDDED@D@@8=@ENEBBBBBD;85//6?@@>77<@DFM?82228>D>>77273BB==97330/.--/8@75-,,/,,,0/53,\n+@WTSI_1055_1f22.p1kpIBF bases 1 to 370\n+CGACCAATGCTCGGTCCGTCACGTAGAGCAATCCGTTTGAGCGATCCACACGAAAATCTTAAGCGCAAAAAAGATTAATATTAATTATTTAACCATCTAATTATTTTAAAAATTTGCCGAAATAGTATCCGATCAAATCGGTTCTGACAATTTTACATTATCTGTTAGCCGTGCCAAAGTCTCTCTCTCACATTCGGTGGCAGCCGGTTGTCGTTGTCCAAGCACAAATTCTACGCTGCCATTATTGCCTTCGTCTCTGTCGCGTGCCAAAAAGCGTCCGATGGCGGTGCCAGCCGGCATATTGTCCAGTAGCCGAATGTGCGTGTCCTGGCGATCCCACAGGATCAGTGGCCGATTATCATTTTTGTC\n++WTSI_1055_1f22.p1kpIBF bases 1 to 370\n+!89A>887>>:>68>AHHIIDCCCCCDNNYYTTTTTTTYTTTTTTYYYYYYYYYYNNHHHDF=@=>9BQQYYYIIIIIITTTTTTTTTTTTTNTTNNNNNTTTTTTTTTTYYYYYYYYYYTTTTTTYTTYTTTYTNNNNLNNNNNNNTLLYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTTTOOKKKOYYYYYYYYYYYYYYYYYYYYYYTTTTLKKTTTTYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYYYYYYKKMMTTTTTTTYOKIIIGKKYYYYYOIIIOAQ==<:77:<IIIABBBCDO>>988>?FKYYPFBB,,.8>FAA:6698<>>D>>::33:4>>66,,,<<Q93+-\n+@WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+CAAATCCTACTGGCCGGACAAAAGAAGCGGCCAAACAACGTGCTCTTCACAAGACGATCACCACCAAAAACATTCACACATGCTCAACGAGACATTGCTTGCAGGATGGCAAGTGCAGGAAGCACTTTCCGGTGCATTAGTTTACACTGACTATGTAACCTATTGTTAATTCCCTGTAGAAACCGTTTGAGTACGACACTGTGTACTCTGAAAATGCCTACCCTCGCTACAAGCGCCGCCCACCTCCGCCTTCACTCCAAGAAGCCCAGCAGAGTCCGGAATTATACGGGCGCGAAATGCAATACAAGGACCAGCGTGGCAAACTAATTCGCAAGGACAACTCTCACGTCGTGGCTTTCAGTCCATTTCTGTCAAGCAAATATGTCGCTCAGTAAAATTAATACTTTTTGTGACAAAATTGCTAACTTTTTTGCAGCATTAACGTCGAGTTTGTCGCGGGAGAAGGATGTATAAAGTACTTATGCAAGTACATGATGAAAGGAGCGGACATGGCCTTTGTCCAAGTCACGGATGCCAACACGGGCCAAAGTGCGCTGAACTACGACGAACTGCAGCAAATTCG\n++WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+!333;>HCDHHIIIYIIINTTYYYYTTTTTTYYYYYYNIIIIIININNTONB81+++04HQYTTTTTTTNIIINNTTNTTTTTTTTYYYTTTTTYTTTTTTYYYYYYYYYTTTTTTYYYYYTIIIIIITTTTTTTTTNNNNNNTNNTTTNNNNNNNNNNNNNNNNTTTTTYYTNNJJJJLYYYYYYYYYTTTTTTYTNNNNNNTYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTNNNNNNTTYYYTNNNNNTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYTKKKTNNIIINTYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYTTTTTTOIICBBOQQQQQQC;<88:>>>CIFOYYYYYYQQQQQQQQQCCQQQQHCBAA:AAAAIIA>;A>AAAIC>>AAAACA>>>>III>::>AAACCCIIIA:;==<IIIIIQQAA<:::IA==::8::CQIIIIAA>>CI92\n"
b
diff -r 324775a016ce -r 6a14074bc810 test-data/sanger-pairs-interleaved.fastq
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/sanger-pairs-interleaved.fastq Mon Jul 29 09:28:55 2013 -0400
b
b"@@ -0,0 +1,576 @@\n+@WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+TTACCCGTCGGCGCCGAAAGAGCCGAAGGCTTTGTGACTGAGGCCGGACACTGTGCTGTTAAGCTGGACATTGCCCGACCTGTCGAGTGCGCCGCTCGCCGAAATTCGTTATCGCGTAAATTTATTTATTTATTTTTATTTTTTTAAATAAAAATGACGACTAATTTGTAAGGGCATAACAACAA\n++WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+!,,,./644,,,-0377<:Q777<BB<<60,+.,+,.4.,))))//15>>550007:66>>==7@71/--0:<CDBB;;49/***/***22,/+)))11===798:3.,,1488?133??BKKMODFB?BDB7447B?:8--.E:F?B77?BKKC<<322B:..<41,46>>B<<::::5116..\n+@WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+TGATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGGTACCCGACGTCCGATATCGCGAAAAATGATGTATCTAGATTTGTCAGGAAACGTCCCCGAGTCTGTTCGACAAACAAACGTTATTCCGAACTCCCAACAACAGTATTTGATTGTGTAAAAATCTCTTGGCCTGATTACTATACTTTAGACATTTTTAGTGCCTGTATTGGAGGTATTTTAGGAACTTTTGGAACGAGCTTTTATCGATTTAGGGAACTAAAAAACCGTTCCATATTCATTAGATGCTATTATTTAAAATCCGAGTCTGATTTGCGAT\n++WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+!41>;D>AA>;;=;;>>AA@@CDDAA>>>ADINIIHHDD>::79:>>FIICCCHHHHCCCCCCCCCHHHHIEA>9..''))**,,++''+)**.,,,-,00..0B+..33010701+++-1B1.,??KMOYYQQQQ<<61,))01<:CAIIIIIYYYYTYTTTTYYYYYTTTTNNKKKKYYYYYYYYYYYYPMMOKTTTTYTTTTTYNINNINTNTIIIIIIIIINNYYYYYYYTTOLKKKIIIINNNOKKKKKFFKKYYYYYYYYYYSSMMMQMYYYYYTTTTLLPIDDDDDDFFFFFFMMKKLNIDFFKQQMMMMMMMMHHFF>A>>:779=5<488>>7745/00::300+++0-\n+@WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+CGTGCCAGTTCTAAACTGGTCGTTCAGCGCCAACCGAAGTGCATACCCTGACGAGCATACACGCAGCTGAAGCGCTCCACAAGCAGCTCTCACCACTAGTCCACGCACCACCCCGCAAGGAGACGGCACGCAGCCACGGGCAAAAGCCGCCTGTTTCACACAACAGCCCGGCTGACCCGACCTTTAGAGCCAATTCTTTTCCCGAAGTTACGAATCTAATTTGCCGACTTCCCTTACCTACATTATTCTATCGACTAGAGGCTGTTCACCTTGGAGACCTGCTGCGGATATCGGTACGATCAGGCAGGAGATTCATATCGCTTCCCTCGCATTTTCAAGGGCCGTGTGGAGCGCACGAGACACCACAGGAACCGCGGTGCTTTACGGGCGCAACATCCCTATCTCAGGCTGAGCCACTTCCAGGCACGCACGCCCTAAACCAGAAAAGAGAACTCTGGCTCGGACTCCACACGACGTCTGCGAGTTCATTTGCGTTACCGCGCGAAACAGTTCTTGCGAACCGTCATTTCCCTGGCCTGGCGTGGGAATGTTAACCCACTTCCCTTTCGGCAACCGGATGGACAAACTGCGCAAGCACAGCAAAGTCTTCATCCGTAGTGTGTGACGGCATTAGCCGGTGC\n++WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+!<>AIHHCCCCCCCCIIIINNNNNTTTYYYYYYYYYYTTTTIIIIHHNIIIFDKFDDINNNTTTNIIIIINTTTTTTTYYYYYYTNNNNNTTYNIIIIIINNYYYYYYYYYYYYYYYYYTNNNNNTTTTTTYYYYYYYYYYYYYYYYYTLLJJJNNTTTTYYYYYYYYYTNNJNJLLTYYYYTONJJJOOYYYYYYYYYYYYYTTTTLOJJJJOOYYYYYYYYYTTTTTTYYYTTTTTTYYYYYYYYYYYYYYYYLJJJJJTYYYTLLLTOTJJJJJKKOYYYYTJNJJJOOTOOIIIILKYYYYTINDDDEEOSYYYYYYYYYYYYYYYYYYYYYYTTLTTTTTTTINIIIOYTKB888>>KMYYIIFIIITKYYYYKKKTOTYYYYYYYYYYYYYYYYYYYKIDDDD>>444>BKLKIIGGDIOYYYYIYYYQIIII@@7507>43--/<<IAAIIII>559==A@IIB>>===KMQM??/33?BIIQQIIFCCFCCFIIICIHA?@F>:>:>>=3...08AIIIMIQQQQCCCCQC:>=:6:>:>>IICA>>>>IFCCC>:>AA>99>;>AACAA>>>::7;7AIII>>>:>>IAI>833688949>@C>:>A;98777=;>99::>4755057132+\n+@WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+CTGTGTACAAAGGGCAGGGACGTATTCAGAGCGAGTTGATGACTCGCCCCTACAAGGAATTCCTCGTTCACGGACAATAATTGCAATGTCCGATCCCAATCACGGCAAATTTTCACCGGTTTACCAACCCCTTTCGGGGAAGGACAAGCACGCTGATTTTGCCAGTGTAGCGCGCGTGCAGCCCCGGACATCTAAGGGCATCACAGACCTGTTATTGC\n++WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+!>>>>>DDIFKOOTTTNDDDHHFTTOOKKKYYTTNNNIYYNNNNNNYTIIIIITIFNIDDKKKNNIIIFIITTTTNNNNNINIINGIKMYYYYYOTTTTTYKKLMMMYYYQOOAAAAIQ;7:<<<A>=AAQA>><<<>7::77::7>>IIIAAAA>:>A=>>5:88::=BIIIIIIIII>>7;9733999=8370---128999::14.,0,,0442+\n+@WTSI_1055_1a09.p1kpIBF bases 1 to 497\n+CGAGCTCGGTACCCGGGGATCCCACCGTTTGGAGGGTGAATTCGCGCTGGAAAAAGGTTTTCCATGCAAAAAATGGAACTTCTTCAGCGTCCAAAGCTTTAGTCAGCCAGCAAAGTGTTGGCATTTCATCGAATGGAAATGGTTCAATAAGTAGCGGCAGCCCCAACGTTTTTGAGAAGTTTTGTGGCGTTTTCTCTGAAGGGGTAAAGTCAGGCGAATTGCTGGAAAAGGTGCCATTGGGTGATTTGGAAGTTGTTCTGTTGATGAACCTTTCATGTTCTAGGCGTTTGTGAAGGAATTTGCTGACAATTTGCTCCGAATCCAAAAGGACGTTGAGCGCTGTGATCGGACCATCAAATTCTATTCCAAACGGGACAATTTGGATGCTCTCCAACGGATAATTTGCACTTACATTTATCGTCGGCTGAAGTTGGACATTGAGGACGGTTACGTGCAGGGAATGTGCGATTTGGTCGCTCCTCTTTTGGTGGTGTTT\n++WTSI_1055_1a09.p1kpIBF bases 1 to 497\n+!989>>CCCCCCIIYOICCCCOIYHHA8339>><@75.444N@IDHHHDDNTTYYYYYTTTIIIIINYYYYYTTTTTTTNNNHHHIHHIIIIOQIDKDDDFHIIITYYYYYYYTTTTTYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTTNNNNNNTTTYYYYYYYTTTTYTYYYSSSYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOOKJJNOTTYYYYYYYT"..b'IINNHDFKOOOKKMQMMPPYYYTTTTTTYTTTTNNNNNNKFCCQQYYMMFF<<79?A8335:<:6-2+++\n+@WTSI_1055_1f22.p1kpIBF bases 1 to 370\n+CGACCAATGCTCGGTCCGTCACGTAGAGCAATCCGTTTGAGCGATCCACACGAAAATCTTAAGCGCAAAAAAGATTAATATTAATTATTTAACCATCTAATTATTTTAAAAATTTGCCGAAATAGTATCCGATCAAATCGGTTCTGACAATTTTACATTATCTGTTAGCCGTGCCAAAGTCTCTCTCTCACATTCGGTGGCAGCCGGTTGTCGTTGTCCAAGCACAAATTCTACGCTGCCATTATTGCCTTCGTCTCTGTCGCGTGCCAAAAAGCGTCCGATGGCGGTGCCAGCCGGCATATTGTCCAGTAGCCGAATGTGCGTGTCCTGGCGATCCCACAGGATCAGTGGCCGATTATCATTTTTGTC\n++WTSI_1055_1f22.p1kpIBF bases 1 to 370\n+!89A>887>>:>68>AHHIIDCCCCCDNNYYTTTTTTTYTTTTTTYYYYYYYYYYNNHHHDF=@=>9BQQYYYIIIIIITTTTTTTTTTTTTNTTNNNNNTTTTTTTTTTYYYYYYYYYYTTTTTTYTTYTTTYTNNNNLNNNNNNNTLLYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTTTOOKKKOYYYYYYYYYYYYYYYYYYYYYYTTTTLKKTTTTYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYYYYYYKKMMTTTTTTTYOKIIIGKKYYYYYOIIIOAQ==<:77:<IIIABBBCDO>>988>?FKYYPFBB,,.8>FAA:6698<>>D>>::33:4>>66,,,<<Q93+-\n+@WTSI_1055_1f22.q1kpIBR bases 1 to 496\n+CTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCGCATGAGGAATCGGAAGAGAATAATAACAAGAAAATGACAGATAAAAAGAGTGGAATTGAAGTAGAAGAGAAAAAGGGTAGAGTTGTAACAGAAGAGAAGAAAGTTTTAAATGAAGCGGAAGAAAAGAAGGACGAAGATCAGACGGAAGAGAAGAAAGAAAATGAAAAAGAAGTTAAAAGAAATAATGCGGAAGAGAAGAAGAAATTGGATGAAACTGAAGAGAAGCCGGATGAGGAAAGGGGAGAAAAGAAGAGCAGAGCTGAAGTGGAATTGGAAGAAACAACGAAGAAGAATAATGGACTTAAATATGTTTGGAAGCATCAAAATGAATCGGATGTAAAGAAGTACGAAAACATAATGGAAAGTATGGACGAAAAGAAAATGGAAGAGAAGGAGCTCGTGGACAATTACAGTAATATTTTGTTTGGAA\n++WTSI_1055_1f22.q1kpIBR bases 1 to 496\n+!399>>>>CHHHHBDDDEIIINNTIIFDA>AAAADDDDDDDDDHHHDDHDIIIIIINNNOOBB+++89DFIKKFFINNTTYYYTTTLLLKKKOOTTOLYLLOLTTTTTTTYYYYYYYYYYYYYTIIIDDDFFKOTYYYYYYYYYYYYYTTTLLJTTTYYYYYYYYYYYYTTTNJJLTTLLTTTTYYYYYYYYYTNNNNNTLLMKNNNNNNTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTLLKKKYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTNNNNJJLNNNNNNNNNTTTTTTNNNNNTYTNNNLNNNTTTTTNNLLTTTTTTTTYYYYYYYYYTTNLLLLLLNNNTLYYYYYYYYYYYYYYYTTTTTTYYYYYYYTNNNNNTTTNNNILOOTINNNNNTTTTMYMMMYIIINFFIIIGINIIIIKLLTOKKKMGGDFFFGFFFFFFFFFNNNIN?CCMQ<<3<<D<<+,.66>>F=;>:5.\n+@WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+CAAATCCTACTGGCCGGACAAAAGAAGCGGCCAAACAACGTGCTCTTCACAAGACGATCACCACCAAAAACATTCACACATGCTCAACGAGACATTGCTTGCAGGATGGCAAGTGCAGGAAGCACTTTCCGGTGCATTAGTTTACACTGACTATGTAACCTATTGTTAATTCCCTGTAGAAACCGTTTGAGTACGACACTGTGTACTCTGAAAATGCCTACCCTCGCTACAAGCGCCGCCCACCTCCGCCTTCACTCCAAGAAGCCCAGCAGAGTCCGGAATTATACGGGCGCGAAATGCAATACAAGGACCAGCGTGGCAAACTAATTCGCAAGGACAACTCTCACGTCGTGGCTTTCAGTCCATTTCTGTCAAGCAAATATGTCGCTCAGTAAAATTAATACTTTTTGTGACAAAATTGCTAACTTTTTTGCAGCATTAACGTCGAGTTTGTCGCGGGAGAAGGATGTATAAAGTACTTATGCAAGTACATGATGAAAGGAGCGGACATGGCCTTTGTCCAAGTCACGGATGCCAACACGGGCCAAAGTGCGCTGAACTACGACGAACTGCAGCAAATTCG\n++WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+!333;>HCDHHIIIYIIINTTYYYYTTTTTTYYYYYYNIIIIIININNTONB81+++04HQYTTTTTTTNIIINNTTNTTTTTTTTYYYTTTTTYTTTTTTYYYYYYYYYTTTTTTYYYYYTIIIIIITTTTTTTTTNNNNNNTNNTTTNNNNNNNNNNNNNNNNTTTTTYYTNNJJJJLYYYYYYYYYTTTTTTYTNNNNNNTYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTNNNNNNTTYYYTNNNNNTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYTKKKTNNIIINTYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYTTTTTTOIICBBOQQQQQQC;<88:>>>CIFOYYYYYYQQQQQQQQQCCQQQQHCBAA:AAAAIIA>;A>AAAIC>>AAAACA>>>>III>::>AAACCCIIIA:;==<IIIIIQQAA<:::IA==::8::CQIIIIAA>>CI92\n+@WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+TATGACTGATTACGCCAGCTATTTAGGTGAGACTATAGAATACTCACGCTAGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGATTGCTTTTTGGCTCGCATACTGCAGCCTGGGGAAGTAGTTGACGTTTTGAAGAATTGAGGGAAGTTGACGTGAAACGGCAACGCGGAGCAGGTCGGAAATCGCTTCGCTATCAGAGCCAAGCAACGAAATGGCGATTGCGCTTAAAAAACATTGGTTTGCTTAAAACATCAATGGTCTTCACCGGTAGAAGCAGTCGCCTAGACCAACGTTGTTGACGCAACGAATGGTGTTTTGCTGCTGGGCAGACGTGGGCGGAGTGCTA\n++WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+!..+---77CBI>7---77>>>DACCCHHHIDDDDCCIHHAA84)))%%%))+,32>>HHHHCCCCCCCCCHIIIIINN<B.,,,+++2.22OBNDHHHHHIIDDDDIIYTNNNNNTTTIIIIIITTTTKKYYYYYYYYYYQOB84-,,.<>FIIIIINNNIIIKKMSSSIIIIIIIIIIIILTOOIIIIIFLLLLLLYYSKKLKKKPMSSYSYSSMSS?KKKKFFFIIFKKKKKKKKSMMMSKKIDDDKKKFDDFFFBBDD=DDMMMKDDDDDDKKFFCCKKKKKFFFKKKKFMMMMMKKKKKKKK734:4B<??B@DC=<871<1314/--,,+++++.-5:97--,\n'
b
diff -r 324775a016ce -r 6a14074bc810 test-data/sanger-pairs-mixed.fastq
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/sanger-pairs-mixed.fastq Mon Jul 29 09:28:55 2013 -0400
b
b"@@ -0,0 +1,800 @@\n+@WTSI_1055_1a03.p1kpIBF bases 1 to 312\n+TTGTTGAACAGCAAAAAGGTCAAGAATATGGATGTTCTCGCCATGATTTTTGTGCCATAGGCGCGCATTCACAAGGTCCATCAGTCGNTCAGCCTGCCGCAACACCACCACCAGCCGCAGCAACAACAACAGCACCAGCAGCAGCTGATCCAATCGCATGTGCCACAGAATAACACCCAAAATCAATTAGCGACGGCCGCCCTCCAGCCGGTTCAGCAGCAGAAACAGCACGAAAAATGGGATCCGATCAAAGAATTTGGGCTGCAAAAGGACGAAATGGCGTTGAAGTCACCGCCCAGCAATGTTTGTGT\n++WTSI_1055_1a03.p1kpIBF bases 1 to 312\n+!96CBHOOTTTYYYQMK???OOTYTTTNNNYYYYNIIIFFIIIIIIIYOOOMAA62.((((*,9@MIIIIO?A3007OOOMMII::%%%::AEHIIIQYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTOOKKKKKYMMYYYKIINNNTYYNIIIINYYYYTOLKKKOOKKKKOLTTYYYYSSSSYYYYSSSSSSMMSOOTLLLONIDDDNOTTYQQMMMMPBB9>BDOOTTQMMMMQMMMQQE:666QQYYPMMDDDADDM@B<FDBBDKKKKKKKKIGKINIFFFKDGGIDB?2/\n+@WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+TTACCCGTCGGCGCCGAAAGAGCCGAAGGCTTTGTGACTGAGGCCGGACACTGTGCTGTTAAGCTGGACATTGCCCGACCTGTCGAGTGCGCCGCTCGCCGAAATTCGTTATCGCGTAAATTTATTTATTTATTTTTATTTTTTTAAATAAAAATGACGACTAATTTGTAAGGGCATAACAACAA\n++WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+!,,,./644,,,-0377<:Q777<BB<<60,+.,+,.4.,))))//15>>550007:66>>==7@71/--0:<CDBB;;49/***/***22,/+)))11===798:3.,,1488?133??BKKMODFB?BDB7447B?:8--.E:F?B77?BKKC<<322B:..<41,46>>B<<::::5116..\n+@WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+TGATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGGTACCCGACGTCCGATATCGCGAAAAATGATGTATCTAGATTTGTCAGGAAACGTCCCCGAGTCTGTTCGACAAACAAACGTTATTCCGAACTCCCAACAACAGTATTTGATTGTGTAAAAATCTCTTGGCCTGATTACTATACTTTAGACATTTTTAGTGCCTGTATTGGAGGTATTTTAGGAACTTTTGGAACGAGCTTTTATCGATTTAGGGAACTAAAAAACCGTTCCATATTCATTAGATGCTATTATTTAAAATCCGAGTCTGATTTGCGAT\n++WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+!41>;D>AA>;;=;;>>AA@@CDDAA>>>ADINIIHHDD>::79:>>FIICCCHHHHCCCCCCCCCHHHHIEA>9..''))**,,++''+)**.,,,-,00..0B+..33010701+++-1B1.,??KMOYYQQQQ<<61,))01<:CAIIIIIYYYYTYTTTTYYYYYTTTTNNKKKKYYYYYYYYYYYYPMMOKTTTTYTTTTTYNINNINTNTIIIIIIIIINNYYYYYYYTTOLKKKIIIINNNOKKKKKFFKKYYYYYYYYYYSSMMMQMYYYYYTTTTLLPIDDDDDDFFFFFFMMKKLNIDFFKQQMMMMMMMMHHFF>A>>:779=5<488>>7745/00::300+++0-\n+@WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+CGTGCCAGTTCTAAACTGGTCGTTCAGCGCCAACCGAAGTGCATACCCTGACGAGCATACACGCAGCTGAAGCGCTCCACAAGCAGCTCTCACCACTAGTCCACGCACCACCCCGCAAGGAGACGGCACGCAGCCACGGGCAAAAGCCGCCTGTTTCACACAACAGCCCGGCTGACCCGACCTTTAGAGCCAATTCTTTTCCCGAAGTTACGAATCTAATTTGCCGACTTCCCTTACCTACATTATTCTATCGACTAGAGGCTGTTCACCTTGGAGACCTGCTGCGGATATCGGTACGATCAGGCAGGAGATTCATATCGCTTCCCTCGCATTTTCAAGGGCCGTGTGGAGCGCACGAGACACCACAGGAACCGCGGTGCTTTACGGGCGCAACATCCCTATCTCAGGCTGAGCCACTTCCAGGCACGCACGCCCTAAACCAGAAAAGAGAACTCTGGCTCGGACTCCACACGACGTCTGCGAGTTCATTTGCGTTACCGCGCGAAACAGTTCTTGCGAACCGTCATTTCCCTGGCCTGGCGTGGGAATGTTAACCCACTTCCCTTTCGGCAACCGGATGGACAAACTGCGCAAGCACAGCAAAGTCTTCATCCGTAGTGTGTGACGGCATTAGCCGGTGC\n++WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+!<>AIHHCCCCCCCCIIIINNNNNTTTYYYYYYYYYYTTTTIIIIHHNIIIFDKFDDINNNTTTNIIIIINTTTTTTTYYYYYYTNNNNNTTYNIIIIIINNYYYYYYYYYYYYYYYYYTNNNNNTTTTTTYYYYYYYYYYYYYYYYYTLLJJJNNTTTTYYYYYYYYYTNNJNJLLTYYYYTONJJJOOYYYYYYYYYYYYYTTTTLOJJJJOOYYYYYYYYYTTTTTTYYYTTTTTTYYYYYYYYYYYYYYYYLJJJJJTYYYTLLLTOTJJJJJKKOYYYYTJNJJJOOTOOIIIILKYYYYTINDDDEEOSYYYYYYYYYYYYYYYYYYYYYYTTLTTTTTTTINIIIOYTKB888>>KMYYIIFIIITKYYYYKKKTOTYYYYYYYYYYYYYYYYYYYKIDDDD>>444>BKLKIIGGDIOYYYYIYYYQIIII@@7507>43--/<<IAAIIII>559==A@IIB>>===KMQM??/33?BIIQQIIFCCFCCFIIICIHA?@F>:>:>>=3...08AIIIMIQQQQCCCCQC:>=:6:>:>>IICA>>>>IFCCC>:>AA>99>;>AACAA>>>::7;7AIII>>>:>>IAI>833688949>@C>:>A;98777=;>99::>4755057132+\n+@WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+CTGTGTACAAAGGGCAGGGACGTATTCAGAGCGAGTTGATGACTCGCCCCTACAAGGAATTCCTCGTTCACGGACAATAATTGCAATGTCCGATCCCAATCACGGCAAATTTTCACCGGTTTACCAACCCCTTTCGGGGAAGGACAAGCACGCTGATTTTGCCAGTGTAGCGCGCGTGCAGCCCCGGACATCTAAGGGCATCACAGACCTGTTATTGC\n++WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+!>>>>>DDIFKOOTTTNDDDHHFTTOOKKKYYTTNNNIYYNNNNNNYTIIIIITIFNIDDKKKNNIIIFIITTTTNNNNNINIINGIKMYYYYYOTTTTTYKKLMMMYYYQOOAAAAIQ;7:<<<A>=AAQA>><<<>7::77::7>>IIIAAAA>:>A=>>5:88::=BIIIIIIIII>>7;9733999=8370---128999::14.,0,,0442+\n+@WTSI_1055_1a07.p1kpIBF bases 1 to 574\n+AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACCGGTACGGAGGGAAATTTGATCAT"..b'GATGCTTCAACGAAAACTGATCAGGCGAACTGAAAGGGTGTAAAAAAGATAAAAGAAATTGTAAACGCAGCACATTGTCAAGCAAAGCAACCCAAAAAAATCGATTTTGAGTATAGTCAAAAAGGGTTACCCGTCAATGATGATCTGTTGCTGTTTGTTTGATACTCCTCCTTTCAATTTGCGATTGTTGTTGTTGCAATTGGCACGCGAA\n++WTSI_1055_1f24.q1kpIBR bases 86 to 670\n+!88BHIQQQYYYITTTTIIINNIIIIKKKYYYYIIIIFFYOMTTTYYIIIIAA99//.1<BKKOOTYYYYTTTTNNTTINNNTTYTTNNNIIITTYTTTTTTTTYYYYYIIIIIOYYYYYYYYYYYTTTTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTOTLLYYYYYYYYYTTTTTTTTTTTTTTTTYYYYYYYYYYTTTTTTYYTNNNNNTYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOKKKOOYYYYKK???KQMMMPPPPQMMKKKMPYYYKKKKKKKKKKMMYYYYYYYYYYYYYYYYYYYYYYYYYYYYYQQQQQI51)%%)4<QQQQQQYYYYTTKTTTTTTTYYYYYYYNNNNNNYYYKKKKGGNNNNYYYYYYYYYYYQMMMMQOKKGIIKKKKYQYYYYYYYYTOOLKKIIIIIOYQQQQQQBA>:;AABAACCCIIIOIIBBIIIII:77<><AAIIIOQQIE=>>>CA>AAABBIIIIIII:00882389667>BAAA?A>77:<844>A?;4++0966.+4492000--4922./..++\n+@WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+CAAATCCTACTGGCCGGACAAAAGAAGCGGCCAAACAACGTGCTCTTCACAAGACGATCACCACCAAAAACATTCACACATGCTCAACGAGACATTGCTTGCAGGATGGCAAGTGCAGGAAGCACTTTCCGGTGCATTAGTTTACACTGACTATGTAACCTATTGTTAATTCCCTGTAGAAACCGTTTGAGTACGACACTGTGTACTCTGAAAATGCCTACCCTCGCTACAAGCGCCGCCCACCTCCGCCTTCACTCCAAGAAGCCCAGCAGAGTCCGGAATTATACGGGCGCGAAATGCAATACAAGGACCAGCGTGGCAAACTAATTCGCAAGGACAACTCTCACGTCGTGGCTTTCAGTCCATTTCTGTCAAGCAAATATGTCGCTCAGTAAAATTAATACTTTTTGTGACAAAATTGCTAACTTTTTTGCAGCATTAACGTCGAGTTTGTCGCGGGAGAAGGATGTATAAAGTACTTATGCAAGTACATGATGAAAGGAGCGGACATGGCCTTTGTCCAAGTCACGGATGCCAACACGGGCCAAAGTGCGCTGAACTACGACGAACTGCAGCAAATTCG\n++WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+!333;>HCDHHIIIYIIINTTYYYYTTTTTTYYYYYYNIIIIIININNTONB81+++04HQYTTTTTTTNIIINNTTNTTTTTTTTYYYTTTTTYTTTTTTYYYYYYYYYTTTTTTYYYYYTIIIIIITTTTTTTTTNNNNNNTNNTTTNNNNNNNNNNNNNNNNTTTTTYYTNNJJJJLYYYYYYYYYTTTTTTYTNNNNNNTYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTNNNNNNTTYYYTNNNNNTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYTKKKTNNIIINTYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYTTTTTTOIICBBOQQQQQQC;<88:>>>CIFOYYYYYYQQQQQQQQQCCQQQQHCBAA:AAAAIIA>;A>AAAIC>>AAAACA>>>>III>::>AAACCCIIIA:;==<IIIIIQQAA<:::IA==::8::CQIIIIAA>>CI92\n+@WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+TATGACTGATTACGCCAGCTATTTAGGTGAGACTATAGAATACTCACGCTAGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGATTGCTTTTTGGCTCGCATACTGCAGCCTGGGGAAGTAGTTGACGTTTTGAAGAATTGAGGGAAGTTGACGTGAAACGGCAACGCGGAGCAGGTCGGAAATCGCTTCGCTATCAGAGCCAAGCAACGAAATGGCGATTGCGCTTAAAAAACATTGGTTTGCTTAAAACATCAATGGTCTTCACCGGTAGAAGCAGTCGCCTAGACCAACGTTGTTGACGCAACGAATGGTGTTTTGCTGCTGGGCAGACGTGGGCGGAGTGCTA\n++WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+!..+---77CBI>7---77>>>DACCCHHHIDDDDCCIHHAA84)))%%%))+,32>>HHHHCCCCCCCCCHIIIIINN<B.,,,+++2.22OBNDHHHHHIIDDDDIIYTNNNNNTTTIIIIIITTTTKKYYYYYYYYYYQOB84-,,.<>FIIIIINNNIIIKKMSSSIIIIIIIIIIIILTOOIIIIIFLLLLLLYYSKKLKKKPMSSYSYSSMSS?KKKKFFFIIFKKKKKKKKSMMMSKKIDDDKKKFDDFFFBBDD=DDMMMKDDDDDDKKFFCCKKKKKFFFKKKKFMMMMMKKKKKKKK734:4B<??B@DC=<871<1314/--,,+++++.-5:97--,\n+@WTSI_1055_1g02.p1kpIBF bases 1 to 523\n+AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACGACAAATTCACGGAAGCGTCTCGCACTTTGTGCCGAGGACTGCTGCACAAGGAGCCCACTCTGAGGTTGGGCTGTCGCCGGGTCGGCCGGCCTGAGGACGGCGCGGAAGAGCTGAAGGCACACGCGTTCTTCACACAACCGGACCAGAAGACAGGCAGGGAGCCAATTCCGTGGAGGAAGATGGAGGCCGGCAAGGTGGACGACATTCCCTTCTGAACTGCTAGAGAGGACTTGTAGGAATTCCGTCCTTCAGCTGACACCTCCATTTTGTCCGGACCCCCATTCGGTGTATGCCAAAGATGTGCTGGACATCGAGCAGTTCAGCACTGTCAAGGGAGTTCGTCCGCTTCCACCAAACTTTTCCTACCTGCTGAACCATTAGGTTCGACTTGACGCGACTGACAACTCCTTCTACGACAAGTTCAACAGCGGGTCCGTGTCCATACCTTGGC\n++WTSI_1055_1g02.p1kpIBF bases 1 to 523\n+!08<=AAA:28::87;<::>ACECEIIIIIIIIIIINIKBB>C>QQYNHHHHDDHDHIITIDCCCCOONNNNGDFDDINMINNNNNIHHHHHIINNIIINNNNTYTIIIIDDIIIIYYYTTTTTTYIIIDDDGGITYYSKKKIDNNNNTTNNNNNTYYYTLLLLLLLLLLLYYTYJJJJJNTTTTTTTTTTYYOLLLTTOOOTTTTTTTYNNNNNJJJLLLLLLYYYYYYYYYYSSYYONNNNNNLLTTTTTTTYYYYYYYYYYYYYYYYTMMKKKYYYYYYYYYYYYYTTTTTOOLIILLLLTTLNLLLLLLYYYYYYTTTLLLTTTTTTTYYYYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYNIIIIITYYTTTLTTNIIFFFMYYYYYYYOOLKKOOTIFIFIINTTTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNNNNTYYYYYYYYYYTTTNNNNNNNNTNIIFFFKYYOOOOOIIIA<:77:<<>>>>IOOIHHHDDEIQMMII<924595/4\n'
b
diff -r 324775a016ce -r 6a14074bc810 test-data/sanger-pairs-reverse.fastq
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/sanger-pairs-reverse.fastq Mon Jul 29 09:28:55 2013 -0400
b
b"@@ -0,0 +1,288 @@\n+@WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+TGATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGGTACCCGACGTCCGATATCGCGAAAAATGATGTATCTAGATTTGTCAGGAAACGTCCCCGAGTCTGTTCGACAAACAAACGTTATTCCGAACTCCCAACAACAGTATTTGATTGTGTAAAAATCTCTTGGCCTGATTACTATACTTTAGACATTTTTAGTGCCTGTATTGGAGGTATTTTAGGAACTTTTGGAACGAGCTTTTATCGATTTAGGGAACTAAAAAACCGTTCCATATTCATTAGATGCTATTATTTAAAATCCGAGTCTGATTTGCGAT\n++WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+!41>;D>AA>;;=;;>>AA@@CDDAA>>>ADINIIHHDD>::79:>>FIICCCHHHHCCCCCCCCCHHHHIEA>9..''))**,,++''+)**.,,,-,00..0B+..33010701+++-1B1.,??KMOYYQQQQ<<61,))01<:CAIIIIIYYYYTYTTTTYYYYYTTTTNNKKKKYYYYYYYYYYYYPMMOKTTTTYTTTTTYNINNINTNTIIIIIIIIINNYYYYYYYTTOLKKKIIIINNNOKKKKKFFKKYYYYYYYYYYSSMMMQMYYYYYTTTTLLPIDDDDDDFFFFFFMMKKLNIDFFKQQMMMMMMMMHHFF>A>>:779=5<488>>7745/00::300+++0-\n+@WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+CTGTGTACAAAGGGCAGGGACGTATTCAGAGCGAGTTGATGACTCGCCCCTACAAGGAATTCCTCGTTCACGGACAATAATTGCAATGTCCGATCCCAATCACGGCAAATTTTCACCGGTTTACCAACCCCTTTCGGGGAAGGACAAGCACGCTGATTTTGCCAGTGTAGCGCGCGTGCAGCCCCGGACATCTAAGGGCATCACAGACCTGTTATTGC\n++WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+!>>>>>DDIFKOOTTTNDDDHHFTTOOKKKYYTTNNNIYYNNNNNNYTIIIIITIFNIDDKKKNNIIIFIITTTTNNNNNINIINGIKMYYYYYOTTTTTYKKLMMMYYYQOOAAAAIQ;7:<<<A>=AAQA>><<<>7::77::7>>IIIAAAA>:>A=>>5:88::=BIIIIIIIII>>7;9733999=8370---128999::14.,0,,0442+\n+@WTSI_1055_1a09.q1kpIBR bases 1 to 558\n+TGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCACCCAAAAAAAGTTTAAAAATTCGGAATGCGCTGTTTTCTTGGGTAAATATAAAGTAGGGTCCGGATTTATATTGTCTAAAACGCGAATTGACTTAAAAGATTGACCAAAAAAAGCCTAAAGTCCAAACTCTAATCAATAGAATAAAATGTTGGCAGAAATTTACGTCATGCAAAGGGTGTGCCAAATGGTTGATTTTGTGATTTTGATTTAATACAGAGGGTGCGAGATCAACTGAAATTTTGAGTAAATGCCGAGAGACTTTTTGTTTTTCAATTGTAATTTGAAGTTGGCCCTCTCTCCCCCCGACCGACAGTGGTACTCGGATAATCAGCCGAACAAACAAATATTCGTAGTGTTAAACAGAAGGGAAAGATGTAAGGTAACATTGGATTAGTTTGATGATGAGGCACTGAATTAAGGACAACTTGGTTATTATTATACATCCATGTGATTGTGAAGATTAAAGATGTTCTGGGACCAGGATGCCTTTGGAGAGGTTT\n++WTSI_1055_1a09.q1kpIBR bases 1 to 558\n+!=>>>>>>>DIIIHHDHB99-//66@DIHHHHHHHHHDDCCCCCDHHIIDID@D>C=@KKYYYYKKTIIIIIIYNNIFFFIIMTIDDDDDHHHHDDKFFFIIDDHHHDDDHHHINNINIYYIIONNNINLNNNNNTYYYYYYYTNLLLLLLOOYYYYYYYYYYYYYYYYTTTTTTTTTTTTNNLLLLLLTTNNNJJJNNTTTYYYMMLOOKYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYTTTTTTYYYYLTMTTTTNNNNLLTTTTTTTTTTLLTTTNNNJLLTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTYYTTTTTTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTNNNNNNTONIIINNNNNNNKYYYIOINIIQMOOTNNNNNNNNNTTYYITIIIINNNNNIKKTTTTKTYYYYYYYYYYYYYLF@@@FBC>>=697038<<IIM88+++89I@QAI>::44--344;<><0056699:<9\n+@WTSI_1055_1a10.q1kpIBR bases 1 to 431\n+AAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCACGTAAAAATCGAAAACATAGAAAAAGAAGCAAAGACCGACCGAACCGGTGGGAGAAAGGCTGAATGGGGCATGATTGGGGGAGGGGGGGAAGGTGACGAACCGAACGAATAAATGACAGGACGAGTTTCTCTTTTCTTCTTGGGTTTACATGTGTTGCGTGACCTTCTGAAATGGGCATTCAATGGATGATTGGGAGGGGGGGGGGGGAAGAAGGCCGACCACAGGTTGAATTTCGACTTTCTTCTAATTTTGCCCAACTTTCCCGATGGGGAAGGGCCTATGACCATTCGGTTTTGCAATGAAATCTGCCAATTAAACATCGTCCTTTTTCGTATCTGTGATGGTATGTCGATGGGGTGCG\n++WTSI_1055_1a10.q1kpIBR bases 1 to 431\n+!9;75;;>;>>ACCCC@CCAADNNNNNIIF>>4::>>FFFDDDHHHHHHHDHDDDHHHHINHIIDD>42-55DFIIIIILYYKIIFIIINNYYYYYYYYYLTINNITTYYYYLONIIIILYYYTIIFIFFIMMSSSYSKKLKKOOTTTYYYYYYYSSSYYYLJJJJJTYYYYYYYLTTTTLYYYTTTMOLYYYYYYLLLNLIJIIIILLLYYTTOLKJJKKKTYYYSGGLLLLNLLKKFMJSSSMPMSSMMMSSYYYYSSMKKKKJJMMPSSMB>,,+++>9DDKKKF@@888F=?DFSK==19/99OFB11,,.,,/,.<E99,,,/9:?FB:0//002613../--,,,,.,,,,,-/0910/+-,0..,++..4+;+++4-,,,4./,//66B?54-,,.,,,,48+++2++,,+,,:6=1859/.,\n+@WTSI_1055_1a11.q1kpIBR bases 1 to 301\n+CGAAGGAAAGGCGGCGGAGAAAGTTTCGTCGTTGGCGGAAAAGCCGATGAAACGCGGGGGACGAACGAAGTTTGTGTTTTTTTTAAAAATCTTTTCTCGACGGTTTCCAGGGAATTGGCCAAGTCCATGGACAAAACCAATGCCAACGGTCCTTCGTCCGCTGATTCATCGACTTCGTGTCCCGGGAGCGCGGAGTGCCGCGGCATCCGCCTCAACAGAAAGGGCGTCAGCCGTCGTCCAACCATCACAGCGCCGTTCCCAAAAGCCGTGCCCCCTCGCGCAGTCGTTCGCCTCCACGGT\n++WTSI_1055_1a11.q1kpIBR bases 1 to 301\n+!DDDDEIOTNNNNNTFDDHHHITINNNNNNNNNNNNIITYTTNNIILOYYTTTTTYQKDFFFFKOLLFIIIINTTTTYYYYHHAADHSYYYYYYYYYYYYOTTTTNTTTNITTTTTTNNNLJLLNJJKJNLJTTYY"..b'DDDDDDDDDITIIHDAA==8??FFDHHIIFYYYYYYYYYYYYYNNIIIINNTTTYTTTYTOOLYYYIIIILLYYYYKKYYYYYYYONNNNNTYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYNTTTTLYYYYYYJLLJLLYYYYYYYTTTTNNTJNNTLLTTTTTJTTTTTTTTYYYYTTTTTTYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTOOYKKLIIIIIIYLMKKKOOTTTTTYYYTNIIIIITTTYYYYYYYYYYYYYTTTTTTYYYYYYYYYTTTKKKNIIITTYYYYYIIGIGB@=@@FFNNIIKKKMHFFQIIFDDDDDKMKTIIIIOOKIIIIIOOMCBAAAAAQABEHIEIAA::0++1569>>>6///-\n+@WTSI_1055_1f20.q1kpIBR bases 1 to 451\n+AGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCACGAAATGTGTGTGATATTTTAATGAATAAACTTCTTTTTTAATATCATATTAATAATTATTGTATCGTTTTACAACTTTCTATTCATATACTTTTCATCATCATCCCATCCGGTATCACTGCTCCTCCTCCTGCGCCCACCGGCCATCAGTCACTTTCGTGTCATTCCGTCGACAGTGTGGTGGTGGTAGTCAAAATTTGTTGACGGAAAGCCTCCAAAAATTGTTGAAATTGGCCAGCCGTGAGGCCCATTGCCATCCGCGGGTGGCATTTGAACTGTCCGCCCCAGTTTGGTGCCATGGCGGACGCCGCATTCGTCGCGTTGCCAGCCGATCCTCAGCAAAGCCGCTTGGCCCACCGCCGGTGGGCATGTGCCGTTGTCGA\n++WTSI_1055_1f20.q1kpIBR bases 1 to 451\n+!;>>>>>>>>>DDCC@CCDDDFFIINNNGEA=>@FFFFFHHHHHHHHHDDHDDDDDHHIIDFDDFDEDDKFIIIIINNIFFNNNIIIIIIYNTTTIINIIIIIOHHDDDDNIIIIIIIHHHHHHHHGDFIIFFINHLLLLLNNNNLNNNNNJNNNLLLLNLNNNNNTTTTTNNNNNNYLNNNNLNJJLNTTTYYYTTTMLOYYTTTTTLYYTTTNJNTTTYYYYTTTTTTYYYYYYYSSSONNNNNTYYYYYTTTTTTTTTTYYYYYYYYYYYYLTTOOLFFIOOOOOOOOYTTTTTTTKTTTTYYTNIIIIIITTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYPPPPPQOIGGGNNIIIIT?<5..8A82,+-..140011199>AAAA;;:<<A>>>@BAADDFDIKIIOIBBIIEII>:338:<II@B77/6-20;;IOA@;;91,\n+@WTSI_1055_1f21.q1kpIBR bases 1 to 336\n+GATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAACAAGGATGCGTCTGCTTGTATAACCGGTAATCAAAAATGTGCAAATAATAAAATTGAGTGCATTTACAGGGAAACCGATCGTTGCTGGCGGTATTCATGGACGTGTTTCGGCCACGGGCCGTGGAATTTGGAAAGGGTTGGCGGTCTTCGTCAACGACAAGAACTACATGAGCAAATTGGGACTGACGACTGGATTTAAGGGGAAAACGTTCATCGTCCAAGGATTCGGTTTGTTTAGGGGAAAGGCATTGAAGGGG\n++WTSI_1055_1f21.q1kpIBR bases 1 to 336\n+!1>;CCCIFCCA>>>>A;>>ADDDDDDDDDDFIIINNNNNDDDDDDFFKIHHHHHHHHDDDDDDDDDDHFFINNKKPPPPOTNNNNIHHHDDDDDDHHHIIINIIIIITYYYYYYYYYYYTNNNNNTYYTTTTTTTTOLLIJJLLNTTYJNNNNNTTTTTNNNNNNYTTTTTNTLLKKYYOTTNNNNNNNTTYYSSPSSSSSSYYYOTOOOYYYYYYYYYTIIIIIITMOKIKNNNNIITNNOLKKMQKKOOTTYQQKKKKLKKKIINNHDFKOOOKKMQMMPPYYYTTTTTTYTTTTNNNNNNKFCCQQYYMMFF<<79?A8335:<:6-2+++\n+@WTSI_1055_1f22.q1kpIBR bases 1 to 496\n+CTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCGCATGAGGAATCGGAAGAGAATAATAACAAGAAAATGACAGATAAAAAGAGTGGAATTGAAGTAGAAGAGAAAAAGGGTAGAGTTGTAACAGAAGAGAAGAAAGTTTTAAATGAAGCGGAAGAAAAGAAGGACGAAGATCAGACGGAAGAGAAGAAAGAAAATGAAAAAGAAGTTAAAAGAAATAATGCGGAAGAGAAGAAGAAATTGGATGAAACTGAAGAGAAGCCGGATGAGGAAAGGGGAGAAAAGAAGAGCAGAGCTGAAGTGGAATTGGAAGAAACAACGAAGAAGAATAATGGACTTAAATATGTTTGGAAGCATCAAAATGAATCGGATGTAAAGAAGTACGAAAACATAATGGAAAGTATGGACGAAAAGAAAATGGAAGAGAAGGAGCTCGTGGACAATTACAGTAATATTTTGTTTGGAA\n++WTSI_1055_1f22.q1kpIBR bases 1 to 496\n+!399>>>>CHHHHBDDDEIIINNTIIFDA>AAAADDDDDDDDDHHHDDHDIIIIIINNNOOBB+++89DFIKKFFINNTTYYYTTTLLLKKKOOTTOLYLLOLTTTTTTTYYYYYYYYYYYYYTIIIDDDFFKOTYYYYYYYYYYYYYTTTLLJTTTYYYYYYYYYYYYTTTNJJLTTLLTTTTYYYYYYYYYTNNNNNTLLMKNNNNNNTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTLLKKKYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTNNNNJJLNNNNNNNNNTTTTTTNNNNNTYTNNNLNNNTTTTTNNLLTTTTTTTTYYYYYYYYYTTNLLLLLLNNNTLYYYYYYYYYYYYYYYTTTTTTYYYYYYYTNNNNNTTTNNNILOOTINNNNNTTTTMYMMMYIIINFFIIIGINIIIIKLLTOKKKMGGDFFFGFFFFFFFFFNNNIN?CCMQ<<3<<D<<+,.66>>F=;>:5.\n+@WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+TATGACTGATTACGCCAGCTATTTAGGTGAGACTATAGAATACTCACGCTAGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGATTGCTTTTTGGCTCGCATACTGCAGCCTGGGGAAGTAGTTGACGTTTTGAAGAATTGAGGGAAGTTGACGTGAAACGGCAACGCGGAGCAGGTCGGAAATCGCTTCGCTATCAGAGCCAAGCAACGAAATGGCGATTGCGCTTAAAAAACATTGGTTTGCTTAAAACATCAATGGTCTTCACCGGTAGAAGCAGTCGCCTAGACCAACGTTGTTGACGCAACGAATGGTGTTTTGCTGCTGGGCAGACGTGGGCGGAGTGCTA\n++WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+!..+---77CBI>7---77>>>DACCCHHHIDDDDCCIHHAA84)))%%%))+,32>>HHHHCCCCCCCCCHIIIIINN<B.,,,+++2.22OBNDHHHHHIIDDDDIIYTNNNNNTTTIIIIIITTTTKKYYYYYYYYYYQOB84-,,.<>FIIIIINNNIIIKKMSSSIIIIIIIIIIIILTOOIIIIIFLLLLLLYYSKKLKKKPMSSYSYSSMSS?KKKKFFFIIFKKKKKKKKSMMMSKKIDDDKKKFDDFFFBBDD=DDMMMKDDDDDDKKFFCCKKKKKFFFKKKKFMMMMMKKKKKKKK734:4B<??B@DC=<871<1314/--,,+++++.-5:97--,\n'
b
diff -r 324775a016ce -r 6a14074bc810 test-data/sanger-pairs-singles.fastq
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/sanger-pairs-singles.fastq Mon Jul 29 09:28:55 2013 -0400
b
b"@@ -0,0 +1,224 @@\n+@WTSI_1055_1a03.p1kpIBF bases 1 to 312\n+TTGTTGAACAGCAAAAAGGTCAAGAATATGGATGTTCTCGCCATGATTTTTGTGCCATAGGCGCGCATTCACAAGGTCCATCAGTCGNTCAGCCTGCCGCAACACCACCACCAGCCGCAGCAACAACAACAGCACCAGCAGCAGCTGATCCAATCGCATGTGCCACAGAATAACACCCAAAATCAATTAGCGACGGCCGCCCTCCAGCCGGTTCAGCAGCAGAAACAGCACGAAAAATGGGATCCGATCAAAGAATTTGGGCTGCAAAAGGACGAAATGGCGTTGAAGTCACCGCCCAGCAATGTTTGTGT\n++WTSI_1055_1a03.p1kpIBF bases 1 to 312\n+!96CBHOOTTTYYYQMK???OOTYTTTNNNYYYYNIIIFFIIIIIIIYOOOMAA62.((((*,9@MIIIIO?A3007OOOMMII::%%%::AEHIIIQYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTOOKKKKKYMMYYYKIINNNTYYNIIIINYYYYTOLKKKOOKKKKOLTTYYYYSSSSYYYYSSSSSSMMSOOTLLLONIDDDNOTTYQQMMMMPBB9>BDOOTTQMMMMQMMMQQE:666QQYYPMMDDDADDM@B<FDBBDKKKKKKKKIGKINIFFFKDGGIDB?2/\n+@WTSI_1055_1a07.p1kpIBF bases 1 to 574\n+AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACCGGTACGGAGGGAAATTTGATCATCGCGGAAGTGCTCGTTTTGATTATCTTGGTGTATGGCGTCTGTGACCTTCTTTTTCGCTGGATGGGCATCGGGGCGTACGCCTGGGGTTCGCGCTCGAGCCCCAAAATCGCCCTCACTTTCGATGACGGGCCCAGCGAACACACCCGGTCCTTGCTCGAGCTGCTGCACCGCCATGGGGTAAAAGCTACCTTTTTTGTCACCGGCGTTCAGGCCGAGCGGCACCCCGACTTGCTAGAAGCCCTGCGGGCCGATGGCCATCAGATCGAATCGCACGGCTACTGGCACCGCCAAGCGTTCTTCCTGTGGCCTTGGCAAGAAGCGCGGCACATCCAACGGGTTCCGGGCAAACTATACCGCCCCCCCTATGGAGCCCACTCCCCCTTCACCCGGCTTCTTGCCCGGCTCCACGGCAAAGTGGTGGCGCTATGTGACCTCGAGTCCAAGGACTGGACCGACCGACCTGCCGAAGAACTGGCCG\n++WTSI_1055_1a07.p1kpIBF bases 1 to 574\n+!>>>AAA:9.4441+35:88;;CHIIIIIIDDDCCCH>Q35-+*46?>CHHHHHHHHIIYOHHHHHTTYTHA72-35>:>DAKHHHQQTTNIIFIGNYNNNNIIIIIINTTYYFFFDDHIINTIIIIIITIIIIIIDDDDDDIIOTNTLIIIKLOYYYYYYYYTTTYYNNNNIIINNNNTIIINNLNIIINYYYYYYSSMMSYYTTMMKLLLNNTTTTTTTTLLKLLYTTJNLJLTYYYLLLLKLLTLLKKKLLTTTTTYYYYYYYTTTTTTTTOOKYLOTTTTYYYYYYTTTTTTYYYYYYYYYYTNNNNNTIFFFIFIIIIIOKKKYKKTOOKYYYYYYYTOOTIIOLKIINNNNNTYYYYYYYYYYYYYYYYYMTTTTTYTTTTIIOIIIQIIMIII:99>AAAIIBBIOOYYYYOKCCDAAFFFIOD@@>>>A<<926<QIQQQQMIIIIFDFFFDDFDDDAA===BGKKKKO943>>@B;BB?:?IMYYQB..+2,448:?88888<877:<>A810))*.12889600<<9411799>83,,,84337:<7227470..---.//+,\n+@WTSI_1055_1a08.p1kpIBF bases 1 to 397\n+TAAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACCTCCGAGAGCACTCGTGACGAATTGATTCCCCTGCTAAGCATCGAATGCGTAAAGTTAGGGCGTGCTCGTCGGCTTTATGAGAAGGGATTTCGCACCGTCGGATCGATAGCGAAAGCGGAGCCTCGCCAACTCATCGAAGCGTTAGGGGGCAAATTGAGCTGTTGCCAGTGCAGGAGGATGATCAGCAGTGCAAAGGTCCCCGGGGGTTGAGTAAATGGTGCTTAAAGGCCCCGTTCCGGAGACGATAAATTTATTCACTTTCAATTAGAGGCTTCAGAAGCTCAAAATTGTTCGAGTTTTTGTTCAGGCGATTATCCGCGATC\n++WTSI_1055_1a08.p1kpIBF bases 1 to 397\n+!.006<=AA83059:85;<::>CCECIIIIIIIIFIIINIBB1160BBKFDHHHHIIIIIIYOHHHHHTOID?:.-+,*,+.,/5.,*+06:IAA99,,,66??:,++002:0--,,170/442//.44<?33/74323/+****+28;=BBDDB<9...9<:32231644460.1.9/5055@@OB@9552B0492//../1@;99///BBFF11.9444///<BF@=666;@<@66140,,.03;;>>???M::2448HHKKMMMMMPYYOLKKKKYYYYYYYQQMHFKHMKLLKOOYYQMMKFKOOTYTDDDDDDQKKKKKKP?B<FFOIIDIOO?633:?AHII=:77:>IQQ?C?BOOO>=695BBNN1-,88553</..8888,,,425.\n+@WTSI_1055_1a15.p1kpIBF bases 1 to 312\n+GACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACGAAAATGTAATTTTTTTTCTTTTAATTTTGTCAACTTTTTTAGCAAAAGCATTGTATTTTAACTGTATATTGCGTTTTGGAGGCAGTCACTGGATTCAAGGGAGCAGACCAAGAAAAATTTTTACAAAGTTTCTAACCCTTTCAAGGTTTTGGACCAATTTCGTAACAAATTTCGCCAAAAAATGTGCATAATTTCTTTTACCACGCCTATCGGCATCAGTAAGTCGTCCCAGTAAAGCTAATA\n++WTSI_1055_1a15.p1kpIBF bases 1 to 312\n+!:AA<4+1441+38::4..A<<BHHHCIIIICHI></4++*=:I>AHHFHDHDHIITIDDDDDOOOOMM@=30++,89QQQQOIIIIDDHHHHYTNNNNHHHIIOIIIIFFYYYYYYYNIIIIIIIIIIHHCC>81**'''(*6:IMMOQOIIIFFFIIILNNTTTTYYYYYYTNNNNIIKKKYTTTIIIMKTKTYTIDDDDDDTTNNIKKIIIIOOYFFFFFDIINNADDIIIKKKOOTLIOONHDKDDKKFFAD>AADMMMYOOOOLKKDIIIMKE966<<KB?>B70////2:B1../004.,,,..,\n+@WTSI_1055_1a17.p1kpIBF bases 1 to 201\n+AGAATGCGGAACAGCTGACGCAAATACATGTAGTCAGGCGCCTCGTCAAAGCGCGACCCGCGGCAGTAGTTCAAGTACATGNAGAACTCAAAGGGGAAGCCCTTGCACAGCATTTCCACCGGTGTCGACATTTTCTTAAAATGTTCACAAAAATGAACTTTTAATCGTAAAAAGGAGACCAATTTCGGGAACTTGTATGT\n++WTSI_1055_1a17.p1kpIBF bases 1 to 201\n+!<CIIIIIIIITHHIHHHNNNTNNNNIIIIIIIIIIITYOIIIIIOMMQ=6+(%(((,.<<QQIFIIFFIIIIIIHEB::%%%45BB64****4IQQQQOOOOOOYYYYYYYYYYYYYYTTTNNNTMOYYYYOTNNNNTTYYTTTTTTYYYMYYYMMYYOOOKKKKKCC???<::B9=BB"..b'TTTGACCCGGGGTCACAAGTCACCATGGTGAAAAGTTCCGTGGTACAAGCTCTAAGCCCGCAGAAAATAAAGGAAGGGCTATTGGAAGTGAGCGGCTTCCACAGCGAAATACCGGTGAAGCTGGACGCCCCACAATATAAACTACAAGTTGCCTTGGCAGACGGCCGTTCGGAGAACCTTGTAGCATATCGGGCCGATTGGATAGTCAGATCCATCCGCAGAGCTGAATGGAGCACCGGGAAAGTGGAAGCAGTTGACGATGAGCCGGATTTGCTCATTGGAATGCCAGAG\n++WTSI_1055_1f17.p1kpIBF bases 1 to 436\n+!08<;=<:404::4.25:9<>>ECIIIIIIDDDDDHIINIMKKKNNNHDDDDDDDDDIIYNIDDDDTDCCCAA;97699;IIITTTTTTNNNNNIIIIIIIIIIITYYYYTTTNNNTTTYYYYYYYYYYYYTTNNNNOO@8@BEIIITTTYYYYNNNNNTNNNNNNNLLLLLLLLNTYTTTTTTTNNNNNTTTYNNNNNTYYYYTTTTTTYYYYYYYYYTNNNJNOYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTTTTTTTTTTTYTTOMKLYYYYTTTTTIINNNNTYYYYYYTNIFFIIIIYYYYYYYYYYYYYYOOOIINKKKQQYYTTTTTTYYYYYOIDDDDDFTKMKGGINKKINNYKKKKMMMKKKKKKMIHH>>==:?;BAAQ=963;<<<<::;33,4./,591,,\n+@WTSI_1055_1f23.p1kpIBF bases 1 to 383\n+AAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACCCCCTATCCCCGCAGAGGTCCATCCAGGAGTCCCAAGAGCACATGGAGAGCACTTTCAAGGCGTTGCGTCGTCAGCTGCCGGTGACGCGCTCCAAGCTGAACTGGCTGAACTTCCATTCCTTCCGCATCACTCAGCAGGAGATGAAGCAGCCGCCCTCGGCCGGCCAGCAACAACAGTCCCAGTGATGGAGCAGTCCAAGAAGAGGAAGCGAGCGAATTTGGAGCATCGCCCATTCATTTCAATTAATACCTTTCCGATTTGTGTACTTTCCCCGACATTTTCGCCATCCAATTATGGCAAGTGAAAGTTT\n++WTSI_1055_1f23.p1kpIBF bases 1 to 383\n+!34:<<<<<;289:87;<::>AACCEDIIIFDDHHHINNTYYYKKNNNIIIIHDDDDDIIYNDDDDDTTYYYFDDAAADFKYMIFFDDDDHDDFIFFIIDDHHHITTTYINNIIIKKKOMIHHDHHIYYYYLYINNNNNOKFFFDENNNNNHGGLLLNLNNNNNYNNNLNJTTTNNNJLINNNNTYYYYYYYYYYYYYYNNNNNNYYYTTTTTYTTTTTTYYYYNLIIIIIIIIIYYYYYTTTTTTYYYYYLIIIIIIILTYYYYTTOOKLYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOOIFFIYYYYYYYYYYYYYYYYYYYYYYYYYYYYKKIIIITOYYYQQOIEEAACC>>@=;>5>AAAAAAIB94\n+@WTSI_1055_1f24.q1kpIBR bases 86 to 670\n+TTGGCACGCAAAAGACGCAATTCTTCAGACGGATTTAAATTGGCAAGAATATCGAGCTAAATGGCAAATGTTTAAAATGGTAATCCCGGAGGAAGAAGACCACGGATTTTTTAACAAAAATGTAAATTTATTTCATGAATTTGTTGCAAAAACCAAAAGGTGCCAAAATATTGATTTACGAAAAGCGCTAACTTCTTCAGCCAAATGCCCTCTTCAAACCCACTTGATCAATCGTTGCACTCAGTGCTTTTTGATCGCCATTTTCTCCACGTCAGATTTAACCAGTCAATTTTGTCATTGGCTTCCTTTCAATGCGGTTGCTGCTTCAAAATCATCTCTTCCATTAAATTCGGGTAACGAGCCCAATGTTCTTGATGCTTCAACGAAAACTGATCAGGCGAACTGAAAGGGTGTAAAAAAGATAAAAGAAATTGTAAACGCAGCACATTGTCAAGCAAAGCAACCCAAAAAAATCGATTTTGAGTATAGTCAAAAAGGGTTACCCGTCAATGATGATCTGTTGCTGTTTGTTTGATACTCCTCCTTTCAATTTGCGATTGTTGTTGTTGCAATTGGCACGCGAA\n++WTSI_1055_1f24.q1kpIBR bases 86 to 670\n+!88BHIQQQYYYITTTTIIINNIIIIKKKYYYYIIIIFFYOMTTTYYIIIIAA99//.1<BKKOOTYYYYTTTTNNTTINNNTTYTTNNNIIITTYTTTTTTTTYYYYYIIIIIOYYYYYYYYYYYTTTTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTOTLLYYYYYYYYYTTTTTTTTTTTTTTTTYYYYYYYYYYTTTTTTYYTNNNNNTYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOKKKOOYYYYKK???KQMMMPPPPQMMKKKMPYYYKKKKKKKKKKMMYYYYYYYYYYYYYYYYYYYYYYYYYYYYYQQQQQI51)%%)4<QQQQQQYYYYTTKTTTTTTTYYYYYYYNNNNNNYYYKKKKGGNNNNYYYYYYYYYYYQMMMMQOKKGIIKKKKYQYYYYYYYYTOOLKKIIIIIOYQQQQQQBA>:;AABAACCCIIIOIIBBIIIII:77<><AAIIIOQQIE=>>>CA>AAABBIIIIIII:00882389667>BAAA?A>77:<844>A?;4++0966.+4492000--4922./..++\n+@WTSI_1055_1g02.p1kpIBF bases 1 to 523\n+AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACGACAAATTCACGGAAGCGTCTCGCACTTTGTGCCGAGGACTGCTGCACAAGGAGCCCACTCTGAGGTTGGGCTGTCGCCGGGTCGGCCGGCCTGAGGACGGCGCGGAAGAGCTGAAGGCACACGCGTTCTTCACACAACCGGACCAGAAGACAGGCAGGGAGCCAATTCCGTGGAGGAAGATGGAGGCCGGCAAGGTGGACGACATTCCCTTCTGAACTGCTAGAGAGGACTTGTAGGAATTCCGTCCTTCAGCTGACACCTCCATTTTGTCCGGACCCCCATTCGGTGTATGCCAAAGATGTGCTGGACATCGAGCAGTTCAGCACTGTCAAGGGAGTTCGTCCGCTTCCACCAAACTTTTCCTACCTGCTGAACCATTAGGTTCGACTTGACGCGACTGACAACTCCTTCTACGACAAGTTCAACAGCGGGTCCGTGTCCATACCTTGGC\n++WTSI_1055_1g02.p1kpIBF bases 1 to 523\n+!08<=AAA:28::87;<::>ACECEIIIIIIIIIIINIKBB>C>QQYNHHHHDDHDHIITIDCCCCOONNNNGDFDDINMINNNNNIHHHHHIINNIIINNNNTYTIIIIDDIIIIYYYTTTTTTYIIIDDDGGITYYSKKKIDNNNNTTNNNNNTYYYTLLLLLLLLLLLYYTYJJJJJNTTTTTTTTTTYYOLLLTTOOOTTTTTTTYNNNNNJJJLLLLLLYYYYYYYYYYSSYYONNNNNNLLTTTTTTTYYYYYYYYYYYYYYYYTMMKKKYYYYYYYYYYYYYTTTTTOOLIILLLLTTLNLLLLLLYYYYYYTTTLLLTTTTTTTYYYYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYNIIIIITYYTTTLTTNIIFFFMYYYYYYYOOLKKOOTIFIFIINTTTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNNNNTYYYYYYYYYYTTTNNNNNNNNTNIIFFFKYYOOOOOIIIA<:77:<<>>>>IOOIHHHDDEIQMMII<924595/4\n'
b
diff -r 324775a016ce -r 6a14074bc810 tools/fastq/fastq_paired_unpaired.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/fastq/fastq_paired_unpaired.py Mon Jul 29 09:28:55 2013 -0400
[
b'@@ -0,0 +1,241 @@\n+#!/usr/bin/env python\n+"""Divides a FASTQ into paired and single (orphan reads) as separate files.\n+\n+The input file should be a valid FASTQ file which has been sorted so that\n+any partner forward+reverse reads are consecutive. The output files all\n+preserve this sort order. Pairing are recognised based on standard name\n+suffices. See below or run the tool with no arguments for more details.\n+\n+Note that the FASTQ variant is unimportant (Sanger, Solexa, Illumina, or even\n+Color Space should all work equally well).\n+\n+This script is copyright 2010-2013 by Peter Cock, The James Hutton Institute\n+(formerly SCRI), Scotland, UK. All rights reserved.\n+\n+See accompanying text file for licence details (MIT license).\n+"""\n+import os\n+import sys\n+import re\n+from galaxy_utils.sequence.fastq import fastqReader, fastqWriter\n+\n+if "-v" in sys.argv or "--version" in sys.argv:\n+    print "Version 0.0.8"\n+    sys.exit(0)\n+\n+def stop_err(msg, err=1):\n+   sys.stderr.write(msg.rstrip() + "\\n")\n+   sys.exit(err)\n+\n+msg = """Expect either 3 or 4 arguments, all FASTQ filenames.\n+\n+If you want two output files, use four arguments:\n+ - FASTQ variant (e.g. sanger, solexa, illumina or cssanger)\n+ - Sorted input FASTQ filename,\n+ - Output paired FASTQ filename (forward then reverse interleaved),\n+ - Output singles FASTQ filename (orphan reads)\n+\n+If you want three output files, use five arguments:\n+ - FASTQ variant (e.g. sanger, solexa, illumina or cssanger)\n+ - Sorted input FASTQ filename,\n+ - Output forward paired FASTQ filename,\n+ - Output reverse paired FASTQ filename,\n+ - Output singles FASTQ filename (orphan reads)\n+\n+The input file should be a valid FASTQ file which has been sorted so that\n+any partner forward+reverse reads are consecutive. The output files all\n+preserve this sort order.\n+\n+Any reads where the forward/reverse naming suffix used is not recognised\n+are treated as orphan reads. The tool supports the /1 and /2 convention\n+originally used by Illumina, the .f and .r convention, and the Sanger\n+convention (see http://staden.sourceforge.net/manual/pregap4_unix_50.html\n+for details), and the new Illumina convention where the reads have the\n+same identifier with the fragment at the start of the description, e.g.\n+\n+@HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 1:N:0:TGNCCA\n+@HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 2:N:0:TGNCCA \n+\n+Note that this does support multiple forward and reverse reads per template\n+(which is quite common with Sanger sequencing), e.g. this which is sorted\n+alphabetically:\n+\n+WTSI_1055_4p17.p1kapIBF\n+WTSI_1055_4p17.p1kpIBF\n+WTSI_1055_4p17.q1kapIBR\n+WTSI_1055_4p17.q1kpIBR\n+\n+or this where the reads already come in pairs:\n+\n+WTSI_1055_4p17.p1kapIBF\n+WTSI_1055_4p17.q1kapIBR\n+WTSI_1055_4p17.p1kpIBF\n+WTSI_1055_4p17.q1kpIBR\n+\n+both become:\n+\n+WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR\n+WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR\n+"""\n+\n+if len(sys.argv) == 5:\n+    format, input_fastq, pairs_fastq, singles_fastq = sys.argv[1:]\n+elif len(sys.argv) == 6:\n+    pairs_fastq = None\n+    format, input_fastq, pairs_f_fastq, pairs_r_fastq, singles_fastq = sys.argv[1:]\n+else:\n+    stop_err(msg)\n+\n+format = format.replace("fastq", "").lower()\n+if not format:\n+    format="sanger" #safe default\n+elif format not in ["sanger","solexa","illumina","cssanger"]:\n+    stop_err("Unrecognised format %s" % format)\n+\n+def f_match(name):\n+   if name.endswith("/1") or name.endswith(".f"):\n+      return True\n+\n+#Cope with three widely used suffix naming convensions,\n+#Illumina: /1 or /2\n+#Forward/revered: .f or .r\n+#Sanger, e.g. .p1k and .q1k\n+#See http://staden.sourceforge.net/manual/pregap4_unix_50.html\n+re_f = re.compile(r"(/1|\\.f|\\.[sfp]\\d\\w*)$")\n+re_r = re.compile(r"(/2|\\.r|\\.[rq]\\d\\w*)$")\n+\n+#assert re_f.match("demo/1")\n+assert re_f.search("demo.f")\n+assert re_f.search("demo.s1")\n+assert re_f.search("demo.f1k")\n+assert re_f.search("demo.p1")\n+assert re_f.search("demo.p1k")\n+assert re_f.search("'..b'les = 0, 0, 0, 0, 0, 0\n+in_handle = open(input_fastq)\n+if pairs_fastq:\n+    pairs_f_writer = fastqWriter(open(pairs_fastq, "w"), format)\n+    pairs_r_writer = pairs_f_writer\n+else:\n+    pairs_f_writer = fastqWriter(open(pairs_f_fastq, "w"), format)\n+    pairs_r_writer = fastqWriter(open(pairs_r_fastq, "w"), format)\n+singles_writer = fastqWriter(open(singles_fastq, "w"), format)\n+last_template, buffered_reads = None, []\n+\n+for record in fastqReader(in_handle, format):\n+    count += 1\n+    name = record.identifier.split(None,1)[0]\n+    assert name[0]=="@", record.identifier #Quirk of the Galaxy parser\n+    is_forward = False\n+    suffix = re_f.search(name)\n+    if suffix:\n+        #============\n+        #Forward read\n+        #============\n+        template = name[:suffix.start()]\n+        is_forward = True\n+    elif re_illumina_f.match(record.identifier):\n+        template = name #No suffix\n+        is_forward = True\n+    if is_forward:\n+        #print name, "forward", template\n+        forward += 1\n+        if last_template == template:\n+            buffered_reads.append(record)\n+        else:\n+            #Any old buffered reads are orphans\n+            for old in buffered_reads:\n+                singles_writer.write(old)\n+                singles += 1\n+            #Save this read in buffer\n+            buffered_reads = [record]\n+            last_template = template\n+    else:\n+        is_reverse = False\n+        suffix = re_r.search(name)\n+        if suffix:\n+            #============\n+            #Reverse read\n+            #============\n+            template = name[:suffix.start()]\n+            is_reverse = True\n+        elif re_illumina_r.match(record.identifier):\n+            template = name #No suffix\n+            is_reverse = True\n+        if is_reverse:\n+            #print name, "reverse", template\n+            reverse += 1\n+            if last_template == template and buffered_reads:\n+                #We have a pair!\n+                #If there are multiple buffered forward reads, want to pick\n+                #the first one (although we could try and do something more\n+                #clever looking at the suffix to match them up...)\n+                old = buffered_reads.pop(0)\n+                pairs_f_writer.write(old)\n+                pairs_r_writer.write(record)\n+                pairs += 2\n+            else:\n+                #As this is a reverse read, this and any buffered read(s) are\n+                #all orphans\n+                for old in buffered_reads:\n+                    singles_writer.write(old)\n+                    singles += 1\n+                buffered_reads = []\n+                singles_writer.write(record)\n+                singles += 1\n+                last_template = None\n+        else:\n+            #===========================\n+            #Neither forward nor reverse\n+            #===========================\n+            singles_writer.write(record)\n+            singles += 1\n+            neither += 1\n+            for old in buffered_reads:\n+                singles_writer.write(old)\n+                singles += 1\n+            buffered_reads = []\n+            last_template = None\n+if last_template:\n+    #Left over singles...\n+    for old in buffered_reads:\n+        singles_writer.write(old)\n+        singles += 1\n+in_handle.close\n+singles_writer.close()\n+if pairs_fastq:\n+    pairs_f_writer.close()\n+    assert pairs_r_writer.file.closed\n+else:\n+    pairs_f_writer.close()\n+    pairs_r_writer.close()\n+\n+if neither:\n+    print "%i reads (%i forward, %i reverse, %i neither), %i in pairs, %i as singles" \\\n+           % (count, forward, reverse, neither, pairs, singles)\n+else:\n+    print "%i reads (%i forward, %i reverse), %i in pairs, %i as singles" \\\n+           % (count, forward, reverse, pairs, singles)\n+\n+assert count == pairs + singles == forward + reverse + neither, \\\n+       "%i vs %i+%i=%i vs %i+%i+%i=%i" \\\n+       % (count,pairs,singles,pairs+singles,forward,reverse,neither,forward+reverse+neither)\n'
b
diff -r 324775a016ce -r 6a14074bc810 tools/fastq/fastq_paired_unpaired.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/fastq/fastq_paired_unpaired.rst Mon Jul 29 09:28:55 2013 -0400
b
@@ -0,0 +1,109 @@
+Galaxy tool to divide FASTQ files into paired and unpaired reads
+================================================================
+
+This tool is copyright 2010-2013 by Peter Cock, The James Hutton Institute
+(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
+See the licence text below.
+
+This tool is a short Python script which divides a FASTQ file into paired
+reads, and single or orphan reads. You can have separate files for the
+forward/reverse reads, or have them interleaved in a single file.
+
+Note that the FASTQ variant is unimportant (Sanger, Solexa, Illumina, or even
+Color Space should all work equally well).
+
+This tool is available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/peterjc/fastq_paired_unpaired
+
+
+Automated Installation
+======================
+
+This should be straightforward, Galaxy should automatically download and install
+the tool from the Galaxy Tool Shed, and run the unit tests
+
+
+Manual Installation
+===================
+
+There are just two files to install:
+
+* fastq_paired_unpaired.py (the Python script)
+* fastq_paired_unpaired.xml (the Galaxy tool definition)
+
+The suggested location is in the Galaxy folder tools/fastq next to other FASTQ
+tools provided with Galaxy.
+
+You will also need to modify the tools_conf.xml file to tell Galaxy to offer
+the tool. One suggested location is next to the fastq_filter.xml entry. Simply
+add the line::
+
+    <tool file="fastq/fastq_paired_unpaired.xml" />
+
+That's it.
+
+
+History
+=======
+
+======= ======================================================================
+Version Changes
+------- ----------------------------------------------------------------------
+v0.0.1  - Initial version, using Biopython
+v0.0.2  - Help text; cope with multiple pairs per template
+v0.0.3  - Galaxy XML wrappers added
+v0.0.4  - Use Galaxy library to handle FASTQ files (avoid Biopython dependency)
+v0.0.5  - Handle Illumina 1.8 style pair names
+v0.0.6  - Record script version when run from Galaxy
+        - Added unit test (FASTQ file using Sanger naming)
+v0.0.7  - Link to Tool Shed added to help text and this documentation.
+v0.0.8  - Use reStructuredText for this README file.
+        - Adopt standard MIT License.
+======= ======================================================================
+
+
+Developers
+==========
+
+This script and other tools for filtering FASTA, FASTQ and SFF files are
+currently being developed on the following hg branch:
+http://bitbucket.org/peterjc/galaxy-central/src/fasta_filter
+
+For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
+the following command from the Galaxy root folder::
+
+    $ tar -czf fastq_paired_unpaired.tar.gz tools/fastq/fastq_paired_unpaired.* test-data/sanger-pairs-*.fastq
+
+Check this worked::
+
+    $ tar -tzf fastq_paired_unpaired.tar.gz
+    tools/fastq/fastq_paired_unpaired.py
+    tools/fastq/fastq_paired_unpaired.rst
+    tools/fastq/fastq_paired_unpaired.xml
+    test-data/sanger-pairs-forward.fastq
+    test-data/sanger-pairs-interleaved.fastq
+    test-data/sanger-pairs-mixed.fastq
+    test-data/sanger-pairs-reverse.fastq
+    test-data/sanger-pairs-singles.fastq
+
+
+Licence (MIT)
+=============
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
b
diff -r 324775a016ce -r 6a14074bc810 tools/fastq/fastq_paired_unpaired.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/fastq/fastq_paired_unpaired.xml Mon Jul 29 09:28:55 2013 -0400
[
@@ -0,0 +1,105 @@
+<tool id="fastq_paired_unpaired" name="Divide FASTQ file into paired and unpaired reads" version="0.0.7">
+ <description>using the read name suffices</description>
+ <version_command interpreter="python">fastq_paired_unpaired.py --version</version_command>
+ <command interpreter="python">
+fastq_paired_unpaired.py $input_fastq.extension $input_fastq
+#if $output_choice_cond.output_choice=="separate"
+ $output_forward $output_reverse
+#elif $output_choice_cond.output_choice=="interleaved"
+ $output_paired
+#end if
+$output_singles
+ </command>
+ <stdio>
+ <!-- Anything other than zero is an error -->
+ <exit_code range="1:" />
+ <exit_code range=":-1" />
+ </stdio>
+ <inputs>
+ <param name="input_fastq" type="data" format="fastq" label="FASTQ file to divide into paired and unpaired reads"/>
+ <conditional name="output_choice_cond">
+ <param name="output_choice" type="select" label="How to output paired reads?">
+ <option value="separate">Separate (two FASTQ files, for the forward and reverse reads, in matching order).</option>
+ <option value="interleaved">Interleaved (one FASTQ file, alternating forward read then partner reverse read).</option>
+ </param>
+ <!-- Seems need these dummy entries here, compare this to indels/indel_sam2interval.xml -->
+ <when value="separate" />
+ <when value="interleaved" />
+ </conditional>
+ </inputs>
+ <outputs>
+ <data name="output_singles" format="input" label="Orphan or single reads"/>
+ <data name="output_forward" format="input" label="Forward paired reads">
+ <filter>output_choice_cond["output_choice"] == "separate"</filter>
+ </data>
+ <data name="output_reverse" format="input" label="Reverse paired reads">
+ <filter>output_choice_cond["output_choice"] == "separate"</filter>
+ </data>
+ <data name="output_paired" format="input" label="Interleaved paired reads">
+ <filter>output_choice_cond["output_choice"] == "interleaved"</filter>
+ </data>
+ </outputs>
+ <tests>
+ <test>
+ <param name="input_fastq" value="sanger-pairs-mixed.fastq" ftype="fastq"/>
+ <param name="output_choice" value="separate"/>
+ <output name="output_singles" file="sanger-pairs-singles.fastq" ftype="fastq"/>
+ <output name="output_forward" file="sanger-pairs-forward.fastq" ftype="fastq"/>
+ <output name="output_reverse" file="sanger-pairs-reverse.fastq" ftype="fastq"/>
+ </test>
+ <test>
+ <param name="input_fastq" value="sanger-pairs-mixed.fastq" ftype="fastq"/>
+ <param name="output_choice" value="interleaved"/>
+ <output name="output_singles" file="sanger-pairs-singles.fastq" ftype="fastq"/>
+ <output name="output_paired" file="sanger-pairs-interleaved.fastq" ftype="fastq"/>
+ </test>
+ </tests>
+ <help>
+
+**What it does**
+
+Using the common read name suffix conventions, it divides a FASTQ file into
+paired reads, and orphan or single reads.
+
+The input file should be a valid FASTQ file which has been sorted so that
+any partner forward+reverse reads are consecutive. The output files all
+preserve this sort order. Pairing are recognised based on standard name
+suffices. See below or run the tool with no arguments for more details.
+
+Any reads where the forward/reverse naming suffix used is not recognised
+are treated as orphan reads. The tool supports the /1 and /2 convention
+originally used by Illumina, .f and .r convention, the Sanger convention
+(see http://staden.sourceforge.net/manual/pregap4_unix_50.html for details),
+and the current Illumina convention where the reads get the same identifier
+with the fragment number in the description, for example:
+
+ * @HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 1:N:0:TGNCCA
+ * @HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 2:N:0:TGNCCA 
+
+Note that this does support multiple forward and reverse reads per template
+(which is quite common with Sanger sequencing), e.g. this which is sorted
+alphabetically:
+
+ * WTSI_1055_4p17.p1kapIBF
+ * WTSI_1055_4p17.p1kpIBF
+ * WTSI_1055_4p17.q1kapIBR
+ * WTSI_1055_4p17.q1kpIBR
+
+or this where the reads already come in pairs:
+
+ * WTSI_1055_4p17.p1kapIBF
+ * WTSI_1055_4p17.q1kapIBR
+ * WTSI_1055_4p17.p1kpIBF
+ * WTSI_1055_4p17.q1kpIBR
+
+both become:
+
+ * WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR
+ * WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR
+
+**Citation**
+
+This tool is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/fastq_paired_unpaired
+ </help>
+</tool>
b
diff -r 324775a016ce -r 6a14074bc810 tools/filters/get_orfs_or_cdss.py
--- a/tools/filters/get_orfs_or_cdss.py Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,223 +0,0 @@
-#!/usr/bin/env python
-"""Find ORFs in a nucleotide sequence file.
-
-get_orfs_or_cdss.py $input_fasta $input_format $table $ftype $ends $mode $min_len $strand $out_nuc_file $out_prot_file
-
-Takes ten command line options, input sequence filename, format, genetic
-code, CDS vs ORF, end type (open, closed), selection mode (all, top, one),
-minimum length (in amino acids), strand (both, forward, reverse), output
-nucleotide filename, and output protein filename.
-
-This tool is a short Python script which requires Biopython. If you use
-this tool in scientific work leading to a publication, please cite the
-Biopython application note:
-
-Cock et al 2009. Biopython: freely available Python tools for computational
-molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
-http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
-
-This script is copyright 2011-2013 by Peter Cock, The James Hutton Institute
-(formerly SCRI), Dundee, UK. All rights reserved.
-
-See accompanying text file for licence details (MIT/BSD style).
-
-This is version 0.0.3 of the script.
-"""
-import sys
-import re
-
-if "-v" in sys.argv or "--version" in sys.argv:
-    print "v0.0.3"
-    sys.exit(0)
-
-def stop_err(msg, err=1):
-    sys.stderr.write(msg.rstrip() + "\n")
-    sys.exit(err)
-
-try:
-    from Bio.Seq import Seq, reverse_complement, translate
-    from Bio.SeqRecord import SeqRecord
-    from Bio import SeqIO
-    from Bio.Data import CodonTable
-except ImportError:
-    stop_err("Missing Biopython library")
-
-#Parse Command Line
-try:
-    input_file, seq_format, table, ftype, ends, mode, min_len, strand, out_nuc_file, out_prot_file = sys.argv[1:]
-except ValueError:
-    stop_err("Expected ten arguments, got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv)))
-
-try:
-    table = int(table)
-except ValueError:
-    stop_err("Expected integer for genetic code table, got %s" % table)
-
-try:
-    table_obj = CodonTable.ambiguous_generic_by_id[table]
-except KeyError:
-    stop_err("Unknown codon table %i" % table)
-
-if ftype not in ["CDS", "ORF"]:
-    stop_err("Expected CDS or ORF, got %s" % ftype)
-
-if ends not in ["open", "closed"]:
-    stop_err("Expected open or closed for end treatment, got %s" % ends)
-
-try:
-    min_len = int(min_len)
-except ValueError:
-    stop_err("Expected integer for min_len, got %s" % min_len)
-
-if seq_format.lower()=="sff":
-    seq_format = "sff-trim"
-elif seq_format.lower()=="fasta":
-    seq_format = "fasta"
-elif seq_format.lower().startswith("fastq"):
-    seq_format = "fastq"
-else:
-    stop_err("Unsupported file type %r" % seq_format)
-
-print "Genetic code table %i" % table
-print "Minimum length %i aa" % min_len
-#print "Taking %s ORF(s) from %s strand(s)" % (mode, strand)
-
-starts = sorted(table_obj.start_codons)
-assert "NNN" not in starts
-re_starts = re.compile("|".join(starts))
-
-stops = sorted(table_obj.stop_codons)
-assert "NNN" not in stops
-re_stops = re.compile("|".join(stops))
-
-def start_chop_and_trans(s, strict=True):
-    """Returns offset, trimmed nuc, protein."""
-    if strict:
-        assert s[-3:] in stops, s
-    assert len(s) % 3 == 0
-    for match in re_starts.finditer(s):
-        #Must check the start is in frame
-        start = match.start()
-        if start % 3 == 0:
-            n = s[start:]
-            assert len(n) % 3 == 0, "%s is len %i" % (n, len(n))
-            if strict:
-                t = translate(n, table, cds=True)
-            else:
-                #Use when missing stop codon,
-                t = "M" + translate(n[3:], table, to_stop=True)
-            return start, n, t
-    return None, None, None
-
-def break_up_frame(s):
-    """Returns offset, nuc, protein."""
-    start = 0
-    for match in re_stops.finditer(s):
-        index = match.start() + 3
-        if index % 3 != 0:
-            continue
-        n = s[start:index]
-        if ftype=="CDS":
-            offset, n, t = start_chop_and_trans(n)
-        else:
-            offset = 0
-            t = translate(n, table, to_stop=True)
-        if n and len(t) >= min_len:
-            yield start + offset, n, t
-        start = index
-    if ends == "open":
-        #No stop codon, Biopython's strict CDS translate will fail
-        n = s[start:]
-        #Ensure we have whole codons
-        #TODO - Try appending N instead?
-        #TODO - Do the next four lines more elegantly
-        if len(n) % 3:
-            n = n[:-1]
-        if len(n) % 3:
-            n = n[:-1]
-        if ftype=="CDS":
-            offset, n, t = start_chop_and_trans(n, strict=False)
-        else:
-            offset = 0
-            t = translate(n, table, to_stop=True)
-        if n and len(t) >= min_len:
-            yield start + offset, n, t
-                        
-
-def get_all_peptides(nuc_seq):
-    """Returns start, end, strand, nucleotides, protein.
-
-    Co-ordinates are Python style zero-based.
-    """
-    #TODO - Refactor to use a generator function (in start order)
-    #rather than making a list and sorting?
-    answer = []
-    full_len = len(nuc_seq)
-    if strand != "reverse":
-        for frame in range(0,3):
-            for offset, n, t in break_up_frame(nuc_seq[frame:]):
-                start = frame + offset #zero based
-                answer.append((start, start + len(n), +1, n, t))
-    if strand != "forward":
-        rc = reverse_complement(nuc_seq)
-        for frame in range(0,3) :
-            for offset, n, t in break_up_frame(rc[frame:]):
-                start = full_len - frame - offset #zero based
-                answer.append((start - len(n), start, -1, n ,t))
-    answer.sort()
-    return answer
-
-def get_top_peptides(nuc_seq):
-    """Returns all peptides of max length."""
-    values = list(get_all_peptides(nuc_seq))
-    if not values:
-        raise StopIteration
-    max_len = max(len(x[-1]) for x in values)
-    for x in values:
-        if len(x[-1]) == max_len:
-            yield x
-
-def get_one_peptide(nuc_seq):
-    """Returns first (left most) peptide with max length."""
-    values = list(get_top_peptides(nuc_seq))
-    if not values:
-        raise StopIteration
-    yield values[0]
-
-if mode == "all":
-    get_peptides = get_all_peptides
-elif mode == "top":
-    get_peptides = get_top_peptides
-elif mode == "one":
-    get_peptides = get_one_peptide
-
-in_count = 0
-out_count = 0
-if out_nuc_file == "-":
-    out_nuc = sys.stdout
-else:
-    out_nuc = open(out_nuc_file, "w")
-if out_prot_file == "-":
-    out_prot = sys.stdout
-else:
-    out_prot = open(out_prot_file, "w")
-for record in SeqIO.parse(input_file, seq_format):
-    for i, (f_start, f_end, f_strand, n, t) in enumerate(get_peptides(str(record.seq).upper())):
-        out_count += 1
-        if f_strand == +1:
-            loc = "%i..%i" % (f_start+1, f_end)
-        else:
-            loc = "complement(%i..%i)" % (f_start+1, f_end)
-        descr = "length %i aa, %i bp, from %s of %s" \
-                % (len(t), len(n), loc, record.description)
-        r = SeqRecord(Seq(n), id = record.id + "|%s%i" % (ftype, i+1), name = "", description= descr)
-        t = SeqRecord(Seq(t), id = record.id + "|%s%i" % (ftype, i+1), name = "", description= descr)
-        SeqIO.write(r, out_nuc, "fasta")
-        SeqIO.write(t, out_prot, "fasta")
-    in_count += 1
-if out_nuc is not sys.stdout:
-    out_nuc.close()
-if out_prot is not sys.stdout:
-    out_prot.close()
-
-print "Found %i %ss in %i sequences" % (out_count, ftype, in_count)
b
diff -r 324775a016ce -r 6a14074bc810 tools/filters/get_orfs_or_cdss.txt
--- a/tools/filters/get_orfs_or_cdss.txt Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,93 +0,0 @@
-Galaxy tool to find ORFs or simple CDSs
-=======================================
-
-This tool is copyright 2011-2013 by Peter Cock, The James Hutton Institute
-(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
-See the licence text below.
-
-This tool is a short Python script (using Biopython library functions)
-to search nucleotide sequences for open reading frames (ORFs) or coding
-sequences (CDSs) where the first potential start codon is used. See the
-help text in the XML file for more information.
-
-There are just two files to install:
-
-* get_orfs_or_cdss.py (the Python script)
-* get_orfs_or_cdss.xml (the Galaxy tool definition)
-
-If you are installing this manually (rather than via the Tool Shed), the
-suggested location is in the Galaxy folder tools/filters next to the tool
-for calling sff_extract.py for converting SFF to FASTQ or FASTA + QUAL.
-You will also need to modify the tools_conf.xml file to tell Galaxy to offer the
-tool. One suggested location is in the filters section. Simply add the line:
-
-<tool file="filters/get_orfs_or_cdss.xml" />
-
-You will also need to install Biopython 1.54 or later. If you want to run
-the unit tests, include this line in tools_conf.xml.sample and the sample
-FASTA files under the test-data directory. Then:
-
-./run_functional_tests.sh -id get_orfs_or_cdss
-
-That's it.
-
-
-History
-=======
-
-v0.0.1 - Initial version.
-v0.0.2 - Correct labelling issue on reverse strand.
-       - Use the new <stdio> settings in the XML wrappers to catch errors
-v0.0.3 - Include unit tests.
-       - Record Python script version when run from Galaxy.
-
-
-Developers
-==========
-
-This script and related tools are being developed on the following hg branch:
-http://bitbucket.org/peterjc/galaxy-central/src/tools
-
-For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
-the following command from the Galaxy root folder:
-
-$ tar -czf get_orfs_or_cdss.tar.gz tools/filters/get_orfs_or_cdss.* test-data/get_orf_input*.fasta test-data/Ssuis.fasta
-
-Check this worked:
-
-$ tar -tzf get_orfs_or_cdss.tar.gz
-filter/get_orfs_or_cdss.py
-filter/get_orfs_or_cdss.txt
-filter/get_orfs_or_cdss.xml
-test-data/get_orf_input.fasta
-test-data/get_orf_input.Suis_ORF.nuc.fasta
-test-data/get_orf_input.Suis_ORF.prot.fasta
-test-data/get_orf_input.t11_nuc_out.fasta
-test-data/get_orf_input.t11_open_nuc_out.fasta
-test-data/get_orf_input.t11_open_prot_out.fasta
-test-data/get_orf_input.t11_prot_out.fasta
-test-data/get_orf_input.t1_nuc_out.fasta
-test-data/get_orf_input.t1_prot_out.fasta
-test-data/Ssuis.fasta
-
-
-Licence (MIT/BSD style)
-=======================
-
-Permission to use, copy, modify, and distribute this software and its
-documentation with or without modifications and for any purpose and
-without fee is hereby granted, provided that any copyright notices
-appear in all copies and that both those copyright notices and this
-permission notice appear in supporting documentation, and that the
-names of the contributors or copyright holders not be used in
-advertising or publicity pertaining to distribution of the software
-without specific prior permission.
-
-THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL
-WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE
-CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT
-OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
-OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
-OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE
-OR PERFORMANCE OF THIS SOFTWARE.
b
diff -r 324775a016ce -r 6a14074bc810 tools/filters/get_orfs_or_cdss.xml
--- a/tools/filters/get_orfs_or_cdss.xml Tue Apr 23 11:48:43 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
b'@@ -1,164 +0,0 @@\n-<tool id="get_orfs_or_cdss" name="Get open reading frames (ORFs) or coding sequences (CDSs)" version="0.0.3">\n-\t<description>e.g. to get peptides from ESTs</description>\n-\t<version_command interpreter="python">get_orfs_or_cdss.py --version</version_command>\n-\t<command interpreter="python">\n-get_orfs_or_cdss.py $input_file $input_file.ext $table $ftype $ends $mode $min_len $strand $out_nuc_file $out_prot_file\n-\t</command>\n-\t<stdio>\n-\t\t<!-- Anything other than zero is an error -->\n-\t\t<exit_code range="1:" />\n-\t\t<exit_code range=":-1" />\n-\t</stdio>\n-\t<inputs>\n-\t\t<param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file (nucleotides)" help="FASTA, FASTQ, or SFF format." />\n-\t\t<param name="table" type="select" label="Genetic code" help="Tables from the NCBI, these determine the start and stop codons">\n-\t\t\t<option value="1">1. Standard</option>\n-\t\t\t<option value="2">2. Vertebrate Mitochondrial</option>\n-\t\t\t<option value="3">3. Yeast Mitochondrial</option>\n-\t\t\t<option value="4">4. Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma</option>\n-\t\t\t<option value="5">5. Invertebrate Mitochondrial</option>\n-\t\t\t<option value="6">6. Ciliate Macronuclear and Dasycladacean</option>\n-\t\t\t<option value="9">9. Echinoderm Mitochondrial</option>\n-\t\t\t<option value="10">10. Euplotid Nuclear</option>\n-\t\t\t<option value="11">11. Bacterial</option>\n-\t\t\t<option value="12">12. Alternative Yeast Nuclear</option>\n-\t\t\t<option value="13">13. Ascidian Mitochondrial</option>\n-\t\t\t<option value="14">14. Flatworm Mitochondrial</option>\n-\t\t\t<option value="15">15. Blepharisma Macronuclear</option>\n-\t\t\t<option value="16">16. Chlorophycean Mitochondrial</option>\n-\t\t\t<option value="21">21. Trematode Mitochondrial</option>\n-\t\t\t<option value="22">22. Scenedesmus obliquus</option>\n-\t\t\t<option value="23">23. Thraustochytrium Mitochondrial</option>\n-\t\t</param>\n-\t\t<param name="ftype" type="select" value="True" label="Look for ORFs or CDSs">\n-                        <option value="ORF">Look for ORFs (check for stop codons only, ignore start codons)</option>\n-                        <option value="CDS">Look for CDSs (with start and stop codons)</option>\n-\t\t</param>\n-                <param name="ends" type="select" value="open" label="Sequence end treatment">\n-\t\t\t<option value="open">Open ended (will allow missing start/stop codons at the ends)</option>\n-                        <option value="closed">Complete (will check for start/stop codons at the ends)</option>\n-                        <!-- TODO? Circular, for using this on finished bacteria etc -->\n-                </param>\n-\n-\t\t<param name="mode" type="select" label="Selection criteria" help="Suppose a sequence has ORFs/CDSs of lengths 100, 102 and 102 -- which should be taken? These options would return 3, 2 or 1 ORF.">\n-                    <option value="all">All ORFs/CDSs from each sequence</option>\n-                    <option value="top">All ORFs/CDSs from each sequence with the maximum length</option>\n-                    <option value="one">First ORF/CDS from each sequence with the maximum length</option>\n-\t\t</param>\n-                <param name="min_len" type="integer" size="5" value="30" label="Minimum length ORF/CDS (in amino acids, e.g. 30 aa = 90 bp plus any stop codon)">\n-                </param>\n-                <param name="strand" type="select" label="Strand to search" help="Use the forward only option if your sequence directionality is known (e.g. from poly-A tails, or strand specific RNA sequencing.">\n-                    <option value="both">Search both the forward and reverse strand</option>\n-                    <option value="forward">Only search the forward strand</option>\n-                    <option value="reverse">Only search the reverse strand</option>\n-                </param>\n-\t</inputs>\n-\t<outputs>\n-\t\t<data name="out_nuc_file" format="fasta" label="${ftype.value}s (nucleotides)" />\n-\t\t<data name="out_prot_file" format="fasta" label="'..b'me="strand" value="forward" />\n-\t\t\t<output name="out_nuc_file" file="get_orf_input.t11_nuc_out.fasta" />\n-\t\t\t<output\tname="out_prot_file" file="get_orf_input.t11_prot_out.fasta" />\n-\t\t</test>\n-\t\t<test>\n-                        <param name="input_file" value="get_orf_input.fasta" />\n-                        <param name="table" value="11" />\n-                        <param name="ftype" value="CDS" />\n-                        <param name="ends" value="open" />\n-                        <param name="mode" value="all" />\n-                        <param name="min_len" value="10" />\n-                        <param name="strand" value="forward" />\n-                        <output name="out_nuc_file" file="get_orf_input.t11_open_nuc_out.fasta" />\n-                        <output name="out_prot_file" file="get_orf_input.t11_open_prot_out.fasta" />\n-\t\t</test>\n-                <test>\n-\t\t\t<param name="input_file" value="Ssuis.fasta" />\n-\t\t\t<param name="table" value="11" />\n-\t\t\t<param name="ftype" value="ORF" />\n-\t\t\t<param name="ends" value="open" />\n-\t\t\t<param name="mode" value="all" />\n-\t\t\t<param name="min_len" value="100" />\n-\t\t\t<param name="strand" value="both" />\n-\t\t\t<output name="out_nuc_file" file="get_orf_input.Suis_ORF.nuc.fasta" />\n-\t\t\t<output name="out_prot_file" file="get_orf_input.Suis_ORF.prot.fasta" />\n-\t\t</test>\n-\t</tests>\n-\t<requirements>\n-\t\t<requirement type="python-module">Bio</requirement>\n-\t</requirements>\n-\t<help>\n-\n-**What it does**\n-\n-Takes an input file of nucleotide sequences (typically FASTA, but also FASTQ\n-and Standard Flowgram Format (SFF) are supported), and searches each sequence\n-for open reading frames (ORFs) or potential coding sequences (CDSs) of the\n-given minimum length. These are returned as FASTA files of nucleotides and\n-protein sequences.\n-\n-You can choose to have all the ORFs/CDSs above the minimum length for each\n-sequence (similar to the EMBOSS getorf tool), those with the longest length\n-equal, or the first ORF/CDS with the longest length (in the special case\n-where a sequence encodes two or more long ORFs/CDSs of the same length). The\n-last option is a reasonable choice when the input sequences represent EST or\n-mRNA sequences, where only one ORF/CDS is expected.\n-\n-Note that if no ORFs/CDSs in a sequence match the criteria, there will be no\n-output for that sequence.\n-\n-Also note that the ORFs/CDSs are assigned modified identifiers to distinguish\n-them from the original full length sequences, by appending a suffix.\n-\n-The start and stop codons are taken from the `NCBI Genetic Codes\n-&lt;http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi&gt;`_.\n-When searching for ORFs, the sequences will run from stop codon to stop\n-codon, and any start codons are ignored. When searching for CDSs, the first\n-potential start codon will be used, giving the longest possible CDS within\n-each ORF, and thus the longest possible protein sequence. This is useful\n-for things like BLAST or domain searching, but since this may not be the\n-correct start codon may not be appropriate for signal peptide detection\n-etc.\n-\n-**Example Usage**\n-\n-Given some EST sequences (Sanger capillary reads) assembled into unigenes,\n-or a transcriptome assembly from some RNA-Seq, each of your nucleotide\n-sequences should (barring sequencing, assembly errors, frame-shifts etc)\n-encode one protein as a single ORF/CDS, which you wish to extract (and\n-perhaps translate into amino acids).\n-\n-If your RNS-Seq data was strand specific, and assembled taking this into\n-account, you should only search for ORFs/CDSs on the forward strand.\n-\n-**Citation**\n-\n-This tool uses Biopython. If you use this tool in scientific work leading\n-to a publication, please cite the Biopython application note (and Galaxy\n-too of course):\n-\n-Cock et al 2009. Biopython: freely available Python tools for computational\n-molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.\n-http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.\n-\n-\t</help>\n-</tool>\n'