Repository 'get_orfs_or_cdss'
hg clone https://toolshed.g2.bx.psu.edu/repos/peterjc/get_orfs_or_cdss

Changeset 4:d51819d2d7e2 (2013-07-29)
Previous changeset 3:6a14074bc810 (2013-07-29) Next changeset 5:5208c15805ec (2013-10-28)
Commit message:
Uploaded correct tar ball (v0.0.5)
added:
test-data/Ssuis.fasta
test-data/get_orf_input.Suis_ORF.nuc.fasta
test-data/get_orf_input.Suis_ORF.prot.fasta
test-data/get_orf_input.fasta
test-data/get_orf_input.t11_nuc_out.fasta
test-data/get_orf_input.t11_open_nuc_out.fasta
test-data/get_orf_input.t11_open_prot_out.fasta
test-data/get_orf_input.t11_prot_out.fasta
test-data/get_orf_input.t1_nuc_out.fasta
test-data/get_orf_input.t1_prot_out.fasta
tools/filters/get_orfs_or_cdss.py
tools/filters/get_orfs_or_cdss.rst
tools/filters/get_orfs_or_cdss.xml
tools/filters/repository_dependencies.xml
removed:
test-data/sanger-pairs-forward.fastq
test-data/sanger-pairs-interleaved.fastq
test-data/sanger-pairs-mixed.fastq
test-data/sanger-pairs-reverse.fastq
test-data/sanger-pairs-singles.fastq
tools/fastq/fastq_paired_unpaired.py
tools/fastq/fastq_paired_unpaired.rst
tools/fastq/fastq_paired_unpaired.xml
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/Ssuis.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/Ssuis.fasta Mon Jul 29 09:30:44 2013 -0400
b
b'@@ -0,0 +1,33460 @@\n+>Streptococcus_suis\n+ATGAACCAAGAACAACTTTTTTGGCAACGATTTATTGAATTGGCAAAGGTAAATTTTAAG\n+CCATCTATTTATGATTTTTATGTCGCTGATGCAAAATTACTCGGAATCAACCAGCAAGTT\n+GCCAATATTTTCTTAAATCGTCCATTTAAAAAAGATTTCTGGGAAAAAAACTTCGAAGAG\n+TTAATGATTGCCGCTAGTTTTGAAAGCTACGGAGAGCCTCTTACCATCCAATATCAATTT\n+ACAGAGGATGAACAGGAGATTAGGAATACTACAAACACAAGAAGTTCAATAGTTCACCAG\n+GTACAGACACTTGAGCCGGCTACTCCTCAAGAAACTTTTAAACCGGTTCATTCTGATATA\n+AAATCCCAGTACACCTTTGCTAATTTTGTACAAGGAGACAATAATCACTGGGCAAAGGCT\n+GCAGCTTTAGCTGTATCTGATAACCTAGGTGAGCTCTACAATCCATTATTCATTTTTGGT\n+GGTCCTGGTCTTGGAAAAACTCATATTTTAAATGCGATTGGAAATAAGGTTCTAGCCGAT\n+AATCCCCAGGCAAGGATAAAATATGTCTCATCGGAAACATTCATCAATGAATTTTTAGAA\n+CACCTCCGTCTCAATGATATGGAAAGTTTCAAAAAAACCTATCGCAATCTGGACTTACTT\n+CTAATTGATGACATTCAGTCTCTCCGTAATAAAGCAACAACACAGGAAGAATTTTTCCAT\n+ACTTTTAATGCGCTTCATGAAAAAAATAAGCAGATTGTACTCACAAGCGACCGTAATCCC\n+GATCACTTAGACAATTTGGAAGAAAGACTAGTAACACGTTTCAAATGGGGGTTAACCAGT\n+GAAATCACTCCACCTGATTTTGAAACACGTATCGCAATTTTACGTAACAAGTGCGAGAAC\n+CTGCCTTACAACTTTACAAATGAGACGCTATCCTATCTAGCTGGGCAATTTGATTCGAAC\n+GTACGTGACCTTGAAGGTGCCTTAAAAGATATCCATTTGATAGCCACTATGCGTCAACTG\n+TCTGAGATAAGTGTCGAGGTTGCTGCTGAGGCTATTCGATCAAGAAAACAAACAAATCCA\n+CAAAACATGGTTATTCCTATTGAGAAAATCCAAACCGAAGTGGGAAATTTCTACGGTGTC\n+AGCTTGAAAGAATTAAAAGGTTCTAAGCGTGTTCAACATATCGTTCACGCGCGACAAGTT\n+GCTATGTTTTTAGCACGTGAAATGACAGACAATTCCCTTCCAAAAATTGGGAAAGAATTT\n+GGTAATCGAGACCATACAACCGTTATGCATGCATACAATAAAATAAAAACTCTCCTCTTG\n+GATGATGAGAATTTAGAAATAGAGATTACCAGTATAAAAAATAAACTTCGTTAACCTGTG\n+TATAACTTTTTTAAAAAACTCTGTTTTTTCCACAAGTTGTGAACAAGTTAATTTCCGCAG\n+TTTTATTGGTCTTTCATCACTTTTCCACAGAATACACAGAGACTACTATTACTATTAACC\n+TTATAGATAATAAATAAAGGAGAATCCATGATTCAATTTTCTATTAATAAAAATATATTT\n+CTACAAGCACTTAGTATTACTAAACGGGCAATCAGTACAAAAAATGCTATTCCAATTCTT\n+TCAACAGTAAAAATTACAGTAACTAGTGAAGGAATCACTTTAACTGGTTCAAATGGACAA\n+ATCTCGATAGAACATTTTATTTCTATTCAAGATGAAAATGCAGGGCTTTTGATCAGTTCT\n+CCAGGTTCCATTCTCTTAGAAGCTGGTTTCTTTATTAATGTCGTATCCAGTATGCCGGAT\n+TTGGTCCTTGACTTCAATGAAATTGAACAAAAGCAAATCGTTTTGACAAGTGGTAAGTCT\n+GAAATCACATTAAAGGGAAAAGAAGCAGAACAGTATCCTCGTTTACAGGAAGTTCCAACT\n+TCAAAACCATTGGTGTTAGAAACCAAAGTATTAAAACAAACAATTAATGAAACAGCATTT\n+GCAGCTTCTACACAAGAAAGTCGTCCTATTCTTACGGGTGTTCATTTTGTTTTAACAGAA\n+AATAAAAATCTAAAAACTGTTGCAACAGATTCACACCGTATGAGCCAACGGAAATTGGTC\n+CTTGATACCTCTGGTGATGATTTTAATGTTGTCATTCCAAGTCGTTCTCTCCGTGAATTT\n+ACTGCAGTTTTTACAGATGATATTGAAACAGTAGAAGTCTTCTTTTCAAATAATCAAATC\n+CTTTTTAGAAGCGAGCATATTAGCTTCTATACACGCTTATTAGAAGGTACCTACCCTGAT\n+ACCGACCGCTTAATTCCAACTGAGTTTAAAACAACTGCAATTTTTGATACTGCAAATCTT\n+CGTCACTCGATGGAGCGTGCTCGTCTTCTTTCAAATGCAACCCAAAATGGTACAGTAAAA\n+CTAGAAATTGCTAATAATGTTGTATCGGCTCATGTAAATTCTCCAGAAGTTGGACGTGTG\n+AATGAGGAATTAGATACTGTAGAAGTATCAGGTGAAGATTTAGTAATCAGCTTTAACCCA\n+ACTTACTTGATAGAAGCATTGAAAGCCACAACTAGTGAACAAGTGAAAATTAGCTTTATC\n+TCTTCTGTCCGTCCATTTACATTGATTCCAAATAATGAAGGGGAAGATTTTATTCAATTG\n+GTTACACCAGTTCGTACCAACTAAATAATATTAAGAACGGCTAAACTAGCCGTTTTTATG\n+TTATACTAAAAAATAGCACCTAGCTTATTTTTATATATTTAGTGATGGGGAATAAATGAC\n+GTTATATATATTAGCTAATCCTAATGCTGGTAGCCATACTGCTGAACATATCATATTCAA\n+AATAAAAGAAAGTTATCCACAGCTTGCAGTTAACATTTTTATGACAGTTGGTCCTGAGGA\n+TGAAAAAAGTCAAATAGAGGCTATTTTAAAGGAGTTTGTCAGTAGTGAAGATCAATTAAT\n+GATTTTAGGCGGAGACGGCACACTATCTAAAGCTTTGCGTTTTTGGCCAGCTAGTCTACC\n+GTTTGCTTATTATCCAACAGGATCTGGAAATGATTTTGCTAAGGCAATGAATATAACATC\n+GCTATATAGAAGTGTAGATGCCATTTTAGAGAGAAAAACAAGTCGGATATATGTTTTAAA\n+CAGTTCATACGGAACGGTTGTAAACAGTATGGATTTTGGCTTTGCAGCTCAAGTTATCAA\n+TGGTTCAACGAATTCAATTTTGAAAAAAATTCTGAACAAGGTAAAACTTGGGAAGTTAAC\n+TTATCTATTCTTTGGTATTAAAACATTATTTTCAAAACAAGCTATAAACTTAGAATTAAC\n+TCTTGATGAAAAATCTTATCAGTTAGATAATCTCTTTTTTATTTCTGTAGCAAATAGTCT\n+TTATTTTGGTGGAGGAATCATGATATGGCCAACAGCAAGTGCTAAAAAGAAGGAAGTAGA\n+TATTGTTTACTTCAAAAATGGAAATTTCTACCAACGTCTACAATCATTGTTAGCCTTATT\n+AACGAAGAGGCATGAATCTTCTCATACGATTCAGCATTTAACAGGGGTAGATGTAGTTTT\n+AAAATCAAAAGAAAAATTATTATTGCAAATAGATGGAGAGACATGCACTGCAAATGAGGT\n+AACGTTAACCTATCAGGAAAGAAGTATGTATCTTTAAGGAGGAAGTATGTACCAATTAGG\n+AACCTTTGTCGAAATGAAAAAGCCCCATGCCTGTGTCATCAAATCGACCGGAAAGAAGGC\n+TAATAAATGGGAGGTTATCCGTCTAGGAGCGGATATTAAAATCCGCTGTACCAACTGTGA\n+CCATGTCGTTATGATGAGCCGGCATGATTTTGAACGAAAAATGAAACAAGT'..b'CTCTACCAACTGAGCTA\n+TGGCGGAAGAAATAGTCCGTACGGGATTCGAACCCGTGTTACCGCCGTGAAAAGGCGGTG\n+TCTTAACCCCTTGACCAACGGACCATTTTTAGAACAATAACTAGTATAATACATGTGACT\n+TTGTTTGTCAATACATTTTTTGATTTTTTATTGTATTGACAGAGTGCTTTGTTTAATGTA\n+AAATAAAATGGTTAAGGTTCCATAGCTCAGCTGGATAGAGCATTCGCCTTCTAAGCGAAC\n+GGTCGCAGGTTCGAATCCTGCTGGAATCATTTAGACCTACCTCGAGTAGGTCTTTTTTCT\n+TGCCATAATTCATAATTAATATATAACACTGGCAAAATCAGACCAATAAGGGCATATTCT\n+TCAAATTGGAAGGATAGGTGAGTAGATATGATGACACCTAGCATAAACCCTATAATGGTC\n+AATAAGATGTTTCTACCTGTTTTTCTAAGTTCTGAATCTTTTTCAATAACTCCTTTAAAC\n+CAGAGATAAGCAGCATTTTTGACATTCCCTGTCATCATCACATTGGCATACGGAGCACCT\n+CGTAACCTTCTAAATGTTTCTACTTGAATAGAGGCTACGAAGGCTAGACTAGCAATTGTA\n+AAAGACGCAGGCATTATAGGTGAGAGAATGATAGTTAGTAAAATAAGAACTAACATCATT\n+ACACTACTACCAAAGTGCCAAGACCATGTTTGTTTTTCAAAATACCTTCTTGCTAAGTAG\n+GTAAAAAATTGTCCGAATACAAAAAATAAAATGGGAATGGAAAAATTAACTACCTGCGCA\n+AAATCACCTTTAGCTAAAAAATAAGCTAGGGAAATAACATTTCCAGATTGTACGCCAGCA\n+AAGCGACCACCCTGAGTCACAAAAGTAAAGGCATTTAAATAACCACTGATAAACGTTAAT\n+GAACAAGCAATTCTCAATCCCTCAAAAACACGATACTCTTTTTGATTCATTTTCACTCCT\n+TGTTTCACGTGAAACTACTTATGATATGGGCTTCCCTGCTGAATCATAAATGCACGATAA\n+ATCTGCTCGATGAGAACTAATTTCATTAGTTGATGGGGAAGTGTCAACTGTCCAAAACTC\n+ATCAACAAATTAGCTCTTTTTTTAATACAAGAATCGAGACCCAAACTACCACCGATGATA\n+AAAGTTATATCTGAATACCCATTTACTGCAATGTCAGATATCCTTTGACTAAATTCTTCC\n+GATGGAAATTGTTTCCCTTCTATCGCTAAGGCAATGACAAAATCTCGCTCTCCAATTTTA\n+GACATAATTCTATCGGCTTCTTTTTTTAATATTTGTTCATTCTCTGCCTGACTGGCTTTA\n+TCTGGTGTTTTTTCATCAGGAAGCTCAATCATATCCAACTTAGTAAATCGTCCCAATCGT\n+TTACTATATTCTGCAATACCTTCTTTGAGGTACTTTTCTTTCAATTTTCCAACGGTAATC\n+AATTTTATTTTCATAAAATAATTGTAACATATCCACAAGCATACGACAGAAAATATTTTT\n+AGAAAATCAGGATATGGCTACAGTTTTTCACATAATTCACAGAGTTATCCACAGGTTGTG\n+GATTGATTTTTGAAAACTTTAAGTTATAATTAAGAAAGAAATAGTACTCTTAAGGAAAAT\n+TAAAGAAATGGAAAGGATTCCTTATATGAAAAAATATTTGAAATTTGCGATTTTATTTGT\n+AATTGGATTTTTTGGGGGTCTTATCGGGGCCTTGTCAGCCTCTTTCTTCCAGCCACAGGT\n+GCAACAAGCAAATTCTGCTATCACTAGTGTCAGCAATGTTCAATATAATAATGAAACTTC\n+CACCACAAAAGCTGTAGAGAAAGTACAAAATGCTGTTGTGTCTGTTATTAATTACCAAAA\n+ATCAGCCAACAATAGTCTTGGTGTTATCTTTGGAAATATTGAATCATCTGACGAACTAGC\n+TGTTGCTGGAGAGGGGTCTGGGGTTATCTATAAAAAATATGGTCAATATGCCTATATTGT\n+GACAAATACGCATGTTATTAATAACGCAGAAAAGATTGATATCCTTTTAGCATCTGGAGA\n+AAAAATTAGCGGTGAACTTGTTGGTTCCGATACATATTCTGATATAGCTGTTATAAAAAT\n+ATCAGCAGATAAAGTCACTGCTGTTGCTGAATTTGCTGATTCCGATACAATTAAAGTTGG\n+AGAAACTGCTATCGCAATTGGTAGTCCTCTAGGTAGCGTCTACGCCAATACAGTTACCCA\n+GGGTATTATTTCTAGCTTAAGTCGGACAGTTACTTCACAATCAAAAGATGGACAAACAAT\n+CTCAACTAACGCTATTCAAACTGATACAGCTATCAACCCTGGAAACTCTGGCGGACCGTT\n+AATCAATACCCAAGGACAAGTGATAGGCATTACCTCTAGCAAAATTACCTCAAGTTCTGC\n+AAATAGCTCAGGCGTGGCTGTAGAAGGGTTGGGATTTGCTATTCCTGCAAATGATGCCGT\n+AGCTATTATCAATCAGCTTGAAAAAACTGGACAAGTTAGCCGACCTGCTCTTGGAGTTCA\n+TATGGTTAACTTGACGACCTTGTCAACTAGTCAATTAGAAAAAGCTGGATTATCAAATAC\n+GGAATTAACATCCGGTGTAGTAATTGTCTCTACACAAAGTGGGCTACCTGCAGATGGAAA\n+ATTAGAAACTTTTGATGTTATTACTGAGATTGACGGAGAAGCTATTCAAAATAAGAGTGA\n+CCTCCAGAGCGCTCTCTACAAACATCAAATTGGAGATACAATCACTGTAACTTATTACCG\n+CAATAATCAGAAACAAACTGTTGACATTAAGTTGACACATTCTACAGAAGAACTTAGCGA\n+ATAATTGACAAATGAGACTTTACACAATTGTAAAGTCTCATTTTTTTTGCTAGAATAAGG\n+ATATATGGAAGAATTACGTACACTAAATATTTCAGAAATCCATCCCAATCCCTATCAGCC\n+AAGAATTCATTTTGATGAAAAGGAGCTACTTGAGCTCGCTCAATCTATTAAGGAAAATGG\n+CTTAATTCAACCGATTATTGTAAGAAAATCTTCTATTATCGGATACGAATTATTAGCTGG\n+AGAAAGAAGGTTGCGAGCCAGTCAATTAGCTGGACTGACTACAATACCAGCAGTGGTAAA\n+AGAACTGACTGATGATGATTTACTCTATCAGGCTATCATAGAGAATCTGCAGCGTTCTAA\n+CTTAAATCCGATAGAAGAAGCAGCCTCTTATCAAAAATTGATTAGTAGAGGGTTAACACA\n+TGATGAAGTTGCTCAAATCATGGGAAAATCAAGACCATATATCAGTAATTTATTGCGCCT\n+ACTAAATCTATCATCTCAGACTAAACAAGCTGTAGAAGAAGGAAAAATTTCACAAGGGCA\n+CGCGCGACAATTGGTGTCATTTTCAGAAGAAAAGCAAGCCGAATGGGTTCAACTCATTTT\n+ATCAAAGGATTTAAGTGTGCGTACGCTTGAAAAATTAATAGCTGCAAATAAGAAAAAACA\n+CACTAAGCTTAAACAACGCGACCAATTTTTAAAAGAACAGGAAGATTCACTCAGTAAAAC\n+TCTTGGAACAGCTACAAAAATTATCAAGAAGAAAAACGGGAGCGGAGAAATTCGGATTAG\n+CTTTAATGACCTCGATGAATTCGAAAGAATTATCAACAATTTTAAATAGACTTGTTTACA\n+ATTTATTTTTATAAACACTCTTTTCCACACTAAAATCATTACAAAAAGTCAGGACCAGCA\n+AGGGTTCTGACTTTTATTCACATCTTGTGGAAAACTTTTCTTAACAGTGTGGATTTTAAA\n+AATTATCTGTGGAAAACTTTTGTTTTTTATGGTACACTATTCTAACGAATATAATGTGAA\n+AGGGGGAAAAT\n'
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/get_orf_input.Suis_ORF.nuc.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.Suis_ORF.nuc.fasta Mon Jul 29 09:30:44 2013 -0400
b
b'@@ -0,0 +1,41831 @@\n+>Streptococcus_suis|ORF1 length 457 aa, 1374 bp, from 1..1374 of Streptococcus_suis\n+ATGAACCAAGAACAACTTTTTTGGCAACGATTTATTGAATTGGCAAAGGTAAATTTTAAG\n+CCATCTATTTATGATTTTTATGTCGCTGATGCAAAATTACTCGGAATCAACCAGCAAGTT\n+GCCAATATTTTCTTAAATCGTCCATTTAAAAAAGATTTCTGGGAAAAAAACTTCGAAGAG\n+TTAATGATTGCCGCTAGTTTTGAAAGCTACGGAGAGCCTCTTACCATCCAATATCAATTT\n+ACAGAGGATGAACAGGAGATTAGGAATACTACAAACACAAGAAGTTCAATAGTTCACCAG\n+GTACAGACACTTGAGCCGGCTACTCCTCAAGAAACTTTTAAACCGGTTCATTCTGATATA\n+AAATCCCAGTACACCTTTGCTAATTTTGTACAAGGAGACAATAATCACTGGGCAAAGGCT\n+GCAGCTTTAGCTGTATCTGATAACCTAGGTGAGCTCTACAATCCATTATTCATTTTTGGT\n+GGTCCTGGTCTTGGAAAAACTCATATTTTAAATGCGATTGGAAATAAGGTTCTAGCCGAT\n+AATCCCCAGGCAAGGATAAAATATGTCTCATCGGAAACATTCATCAATGAATTTTTAGAA\n+CACCTCCGTCTCAATGATATGGAAAGTTTCAAAAAAACCTATCGCAATCTGGACTTACTT\n+CTAATTGATGACATTCAGTCTCTCCGTAATAAAGCAACAACACAGGAAGAATTTTTCCAT\n+ACTTTTAATGCGCTTCATGAAAAAAATAAGCAGATTGTACTCACAAGCGACCGTAATCCC\n+GATCACTTAGACAATTTGGAAGAAAGACTAGTAACACGTTTCAAATGGGGGTTAACCAGT\n+GAAATCACTCCACCTGATTTTGAAACACGTATCGCAATTTTACGTAACAAGTGCGAGAAC\n+CTGCCTTACAACTTTACAAATGAGACGCTATCCTATCTAGCTGGGCAATTTGATTCGAAC\n+GTACGTGACCTTGAAGGTGCCTTAAAAGATATCCATTTGATAGCCACTATGCGTCAACTG\n+TCTGAGATAAGTGTCGAGGTTGCTGCTGAGGCTATTCGATCAAGAAAACAAACAAATCCA\n+CAAAACATGGTTATTCCTATTGAGAAAATCCAAACCGAAGTGGGAAATTTCTACGGTGTC\n+AGCTTGAAAGAATTAAAAGGTTCTAAGCGTGTTCAACATATCGTTCACGCGCGACAAGTT\n+GCTATGTTTTTAGCACGTGAAATGACAGACAATTCCCTTCCAAAAATTGGGAAAGAATTT\n+GGTAATCGAGACCATACAACCGTTATGCATGCATACAATAAAATAAAAACTCTCCTCTTG\n+GATGATGAGAATTTAGAAATAGAGATTACCAGTATAAAAAATAAACTTCGTTAA\n+>Streptococcus_suis|ORF2 length 385 aa, 1158 bp, from 1507..2664 of Streptococcus_suis\n+ATAATAAATAAAGGAGAATCCATGATTCAATTTTCTATTAATAAAAATATATTTCTACAA\n+GCACTTAGTATTACTAAACGGGCAATCAGTACAAAAAATGCTATTCCAATTCTTTCAACA\n+GTAAAAATTACAGTAACTAGTGAAGGAATCACTTTAACTGGTTCAAATGGACAAATCTCG\n+ATAGAACATTTTATTTCTATTCAAGATGAAAATGCAGGGCTTTTGATCAGTTCTCCAGGT\n+TCCATTCTCTTAGAAGCTGGTTTCTTTATTAATGTCGTATCCAGTATGCCGGATTTGGTC\n+CTTGACTTCAATGAAATTGAACAAAAGCAAATCGTTTTGACAAGTGGTAAGTCTGAAATC\n+ACATTAAAGGGAAAAGAAGCAGAACAGTATCCTCGTTTACAGGAAGTTCCAACTTCAAAA\n+CCATTGGTGTTAGAAACCAAAGTATTAAAACAAACAATTAATGAAACAGCATTTGCAGCT\n+TCTACACAAGAAAGTCGTCCTATTCTTACGGGTGTTCATTTTGTTTTAACAGAAAATAAA\n+AATCTAAAAACTGTTGCAACAGATTCACACCGTATGAGCCAACGGAAATTGGTCCTTGAT\n+ACCTCTGGTGATGATTTTAATGTTGTCATTCCAAGTCGTTCTCTCCGTGAATTTACTGCA\n+GTTTTTACAGATGATATTGAAACAGTAGAAGTCTTCTTTTCAAATAATCAAATCCTTTTT\n+AGAAGCGAGCATATTAGCTTCTATACACGCTTATTAGAAGGTACCTACCCTGATACCGAC\n+CGCTTAATTCCAACTGAGTTTAAAACAACTGCAATTTTTGATACTGCAAATCTTCGTCAC\n+TCGATGGAGCGTGCTCGTCTTCTTTCAAATGCAACCCAAAATGGTACAGTAAAACTAGAA\n+ATTGCTAATAATGTTGTATCGGCTCATGTAAATTCTCCAGAAGTTGGACGTGTGAATGAG\n+GAATTAGATACTGTAGAAGTATCAGGTGAAGATTTAGTAATCAGCTTTAACCCAACTTAC\n+TTGATAGAAGCATTGAAAGCCACAACTAGTGAACAAGTGAAAATTAGCTTTATCTCTTCT\n+GTCCGTCCATTTACATTGATTCCAAATAATGAAGGGGAAGATTTTATTCAATTGGTTACA\n+CCAGTTCGTACCAACTAA\n+>Streptococcus_suis|ORF3 length 104 aa, 315 bp, from complement(1707..2021) of Streptococcus_suis\n+ACACCCGTAAGAATAGGACGACTTTCTTGTGTAGAAGCTGCAAATGCTGTTTCATTAATT\n+GTTTGTTTTAATACTTTGGTTTCTAACACCAATGGTTTTGAAGTTGGAACTTCCTGTAAA\n+CGAGGATACTGTTCTGCTTCTTTTCCCTTTAATGTGATTTCAGACTTACCACTTGTCAAA\n+ACGATTTGCTTTTGTTCAATTTCATTGAAGTCAAGGACCAAATCCGGCATACTGGATACG\n+ACATTAATAAAGAAACCAGCTTCTAAGAGAATGGAACCTGGAGAACTGATCAAAAGCCCT\n+GCATTTTCATCTTGA\n+>Streptococcus_suis|ORF4 length 293 aa, 882 bp, from 2756..3637 of Streptococcus_suis\n+ATGACGTTATATATATTAGCTAATCCTAATGCTGGTAGCCATACTGCTGAACATATCATA\n+TTCAAAATAAAAGAAAGTTATCCACAGCTTGCAGTTAACATTTTTATGACAGTTGGTCCT\n+GAGGATGAAAAAAGTCAAATAGAGGCTATTTTAAAGGAGTTTGTCAGTAGTGAAGATCAA\n+TTAATGATTTTAGGCGGAGACGGCACACTATCTAAAGCTTTGCGTTTTTGGCCAGCTAGT\n+CTACCGTTTGCTTATTATCCAACAGGATCTGGAAATGATTTTGCTAAGGCAATGAATATA\n+ACATCGCTATATAGAAGTGTAGATGCCATTTTAGAGAGAAAAACAAGTCGGATATATGTT\n+TTAAACAGTTCATACGGAACGGTTGTAAACAGTATGGATTTTGGCTTTGCAGCTCAAGTT\n+ATCAATGGTTCAACGAATTCAATTTTGAAAAAAATTCTGAACAAGGTAAAACTTGGGAAG\n+TTAACTTATCTATTCTTTGGTATTAAAACATTATTTTCAAAACAAGCTATAAACTTAGAA\n+TTAACTCTTGATGAAAAATCTTATCAGTTAGATAATCTCTTTTTTATTTCTGTAGCAAAT\n+AGTCTTTATTTTGGTGGAGGAATCATGATATGGCCAACAGCAAGTGCTAAAAAG'..b'GCAACCATTGATGGTAAACCTATCAAAATCCAAAAAGCGCAAGATGGT\n+TTTATGAAAGTGGATGTAAGTCCAGGTCAAACTAAACTAGTTTTAACCTTTGTACCAAAT\n+GGTTTCTATCTAGGTTTACTGATTTCTTTTGGTGCAGTTTTTGTATTTTTCTCCTATCAA\n+TTCATTGGATACTATTATTCTAAGAACCGAGAATACTAA\n+>Streptococcus_suis|ORF2907 length 235 aa, 708 bp, from complement(2003907..2004614) of Streptococcus_suis\n+TTTCACGTGAAACAAGGAGTGAAAATGAATCAAAAAGAGTATCGTGTTTTTGAGGGATTG\n+AGAATTGCTTGTTCATTAACGTTTATCAGTGGTTATTTAAATGCCTTTACTTTTGTGACT\n+CAGGGTGGTCGCTTTGCTGGCGTACAATCTGGAAATGTTATTTCCCTAGCTTATTTTTTA\n+GCTAAAGGTGATTTTGCGCAGGTAGTTAATTTTTCCATTCCCATTTTATTTTTTGTATTC\n+GGACAATTTTTTACCTACTTAGCAAGAAGGTATTTTGAAAAACAAACATGGTCTTGGCAC\n+TTTGGTAGTAGTGTAATGATGTTAGTTCTTATTTTACTAACTATCATTCTCTCACCTATA\n+ATGCCTGCGTCTTTTACAATTGCTAGTCTAGCCTTCGTAGCCTCTATTCAAGTAGAAACA\n+TTTAGAAGGTTACGAGGTGCTCCGTATGCCAATGTGATGATGACAGGGAATGTCAAAAAT\n+GCTGCTTATCTCTGGTTTAAAGGAGTTATTGAAAAAGATTCAGAACTTAGAAAAACAGGT\n+AGAAACATCTTATTGACCATTATAGGGTTTATGCTAGGTGTCATCATATCTACTCACCTA\n+TCCTTCCAATTTGAAGAATATGCCCTTATTGGTCTGATTTTGCCAGTGTTATATATTAAT\n+TATGAATTATGGCAAGAAAAAAGACCTACTCGAGGTAGGTCTAAATGA\n+>Streptococcus_suis|ORF2908 length 180 aa, 543 bp, from complement(2004615..2005157) of Streptococcus_suis\n+CCATATCCTGATTTTCTAAAAATATTTTCTGTCGTATGCTTGTGGATATGTTACAATTAT\n+TTTATGAAAATAAAATTGATTACCGTTGGAAAATTGAAAGAAAAGTACCTCAAAGAAGGT\n+ATTGCAGAATATAGTAAACGATTGGGACGATTTACTAAGTTGGATATGATTGAGCTTCCT\n+GATGAAAAAACACCAGATAAAGCCAGTCAGGCAGAGAATGAACAAATATTAAAAAAAGAA\n+GCCGATAGAATTATGTCTAAAATTGGAGAGCGAGATTTTGTCATTGCCTTAGCGATAGAA\n+GGGAAACAATTTCCATCGGAAGAATTTAGTCAAAGGATATCTGACATTGCAGTAAATGGG\n+TATTCAGATATAACTTTTATCATCGGTGGTAGTTTGGGTCTCGATTCTTGTATTAAAAAA\n+AGAGCTAATTTGTTGATGAGTTTTGGACAGTTGACACTTCCCCATCAACTAATGAAATTA\n+GTTCTCATCGAGCAGATTTATCGTGCATTTATGATTCAGCAGGGAAGCCCATATCATAAG\n+TAG\n+>Streptococcus_suis|ORF2909 length 413 aa, 1242 bp, from 2005223..2006464 of Streptococcus_suis\n+GTTATAATTAAGAAAGAAATAGTACTCTTAAGGAAAATTAAAGAAATGGAAAGGATTCCT\n+TATATGAAAAAATATTTGAAATTTGCGATTTTATTTGTAATTGGATTTTTTGGGGGTCTT\n+ATCGGGGCCTTGTCAGCCTCTTTCTTCCAGCCACAGGTGCAACAAGCAAATTCTGCTATC\n+ACTAGTGTCAGCAATGTTCAATATAATAATGAAACTTCCACCACAAAAGCTGTAGAGAAA\n+GTACAAAATGCTGTTGTGTCTGTTATTAATTACCAAAAATCAGCCAACAATAGTCTTGGT\n+GTTATCTTTGGAAATATTGAATCATCTGACGAACTAGCTGTTGCTGGAGAGGGGTCTGGG\n+GTTATCTATAAAAAATATGGTCAATATGCCTATATTGTGACAAATACGCATGTTATTAAT\n+AACGCAGAAAAGATTGATATCCTTTTAGCATCTGGAGAAAAAATTAGCGGTGAACTTGTT\n+GGTTCCGATACATATTCTGATATAGCTGTTATAAAAATATCAGCAGATAAAGTCACTGCT\n+GTTGCTGAATTTGCTGATTCCGATACAATTAAAGTTGGAGAAACTGCTATCGCAATTGGT\n+AGTCCTCTAGGTAGCGTCTACGCCAATACAGTTACCCAGGGTATTATTTCTAGCTTAAGT\n+CGGACAGTTACTTCACAATCAAAAGATGGACAAACAATCTCAACTAACGCTATTCAAACT\n+GATACAGCTATCAACCCTGGAAACTCTGGCGGACCGTTAATCAATACCCAAGGACAAGTG\n+ATAGGCATTACCTCTAGCAAAATTACCTCAAGTTCTGCAAATAGCTCAGGCGTGGCTGTA\n+GAAGGGTTGGGATTTGCTATTCCTGCAAATGATGCCGTAGCTATTATCAATCAGCTTGAA\n+AAAACTGGACAAGTTAGCCGACCTGCTCTTGGAGTTCATATGGTTAACTTGACGACCTTG\n+TCAACTAGTCAATTAGAAAAAGCTGGATTATCAAATACGGAATTAACATCCGGTGTAGTA\n+ATTGTCTCTACACAAAGTGGGCTACCTGCAGATGGAAAATTAGAAACTTTTGATGTTATT\n+ACTGAGATTGACGGAGAAGCTATTCAAAATAAGAGTGACCTCCAGAGCGCTCTCTACAAA\n+CATCAAATTGGAGATACAATCACTGTAACTTATTACCGCAATAATCAGAAACAAACTGTT\n+GACATTAAGTTGACACATTCTACAGAAGAACTTAGCGAATAA\n+>Streptococcus_suis|ORF2910 length 256 aa, 771 bp, from 2006519..2007289 of Streptococcus_suis\n+GGATATATGGAAGAATTACGTACACTAAATATTTCAGAAATCCATCCCAATCCCTATCAG\n+CCAAGAATTCATTTTGATGAAAAGGAGCTACTTGAGCTCGCTCAATCTATTAAGGAAAAT\n+GGCTTAATTCAACCGATTATTGTAAGAAAATCTTCTATTATCGGATACGAATTATTAGCT\n+GGAGAAAGAAGGTTGCGAGCCAGTCAATTAGCTGGACTGACTACAATACCAGCAGTGGTA\n+AAAGAACTGACTGATGATGATTTACTCTATCAGGCTATCATAGAGAATCTGCAGCGTTCT\n+AACTTAAATCCGATAGAAGAAGCAGCCTCTTATCAAAAATTGATTAGTAGAGGGTTAACA\n+CATGATGAAGTTGCTCAAATCATGGGAAAATCAAGACCATATATCAGTAATTTATTGCGC\n+CTACTAAATCTATCATCTCAGACTAAACAAGCTGTAGAAGAAGGAAAAATTTCACAAGGG\n+CACGCGCGACAATTGGTGTCATTTTCAGAAGAAAAGCAAGCCGAATGGGTTCAACTCATT\n+TTATCAAAGGATTTAAGTGTGCGTACGCTTGAAAAATTAATAGCTGCAAATAAGAAAAAA\n+CACACTAAGCTTAAACAACGCGACCAATTTTTAAAAGAACAGGAAGATTCACTCAGTAAA\n+ACTCTTGGAACAGCTACAAAAATTATCAAGAAGAAAAACGGGAGCGGAGAAATTCGGATT\n+AGCTTTAATGACCTCGATGAATTCGAAAGAATTATCAACAATTTTAAATAG\n'
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/get_orf_input.Suis_ORF.prot.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.Suis_ORF.prot.fasta Mon Jul 29 09:30:44 2013 -0400
b
b'@@ -0,0 +1,16670 @@\n+>Streptococcus_suis|ORF1 length 457 aa, 1374 bp, from 1..1374 of Streptococcus_suis\n+MNQEQLFWQRFIELAKVNFKPSIYDFYVADAKLLGINQQVANIFLNRPFKKDFWEKNFEE\n+LMIAASFESYGEPLTIQYQFTEDEQEIRNTTNTRSSIVHQVQTLEPATPQETFKPVHSDI\n+KSQYTFANFVQGDNNHWAKAAALAVSDNLGELYNPLFIFGGPGLGKTHILNAIGNKVLAD\n+NPQARIKYVSSETFINEFLEHLRLNDMESFKKTYRNLDLLLIDDIQSLRNKATTQEEFFH\n+TFNALHEKNKQIVLTSDRNPDHLDNLEERLVTRFKWGLTSEITPPDFETRIAILRNKCEN\n+LPYNFTNETLSYLAGQFDSNVRDLEGALKDIHLIATMRQLSEISVEVAAEAIRSRKQTNP\n+QNMVIPIEKIQTEVGNFYGVSLKELKGSKRVQHIVHARQVAMFLAREMTDNSLPKIGKEF\n+GNRDHTTVMHAYNKIKTLLLDDENLEIEITSIKNKLR\n+>Streptococcus_suis|ORF2 length 385 aa, 1158 bp, from 1507..2664 of Streptococcus_suis\n+IINKGESMIQFSINKNIFLQALSITKRAISTKNAIPILSTVKITVTSEGITLTGSNGQIS\n+IEHFISIQDENAGLLISSPGSILLEAGFFINVVSSMPDLVLDFNEIEQKQIVLTSGKSEI\n+TLKGKEAEQYPRLQEVPTSKPLVLETKVLKQTINETAFAASTQESRPILTGVHFVLTENK\n+NLKTVATDSHRMSQRKLVLDTSGDDFNVVIPSRSLREFTAVFTDDIETVEVFFSNNQILF\n+RSEHISFYTRLLEGTYPDTDRLIPTEFKTTAIFDTANLRHSMERARLLSNATQNGTVKLE\n+IANNVVSAHVNSPEVGRVNEELDTVEVSGEDLVISFNPTYLIEALKATTSEQVKISFISS\n+VRPFTLIPNNEGEDFIQLVTPVRTN\n+>Streptococcus_suis|ORF3 length 104 aa, 315 bp, from complement(1707..2021) of Streptococcus_suis\n+TPVRIGRLSCVEAANAVSLIVCFNTLVSNTNGFEVGTSCKRGYCSASFPFNVISDLPLVK\n+TICFCSISLKSRTKSGILDTTLIKKPASKRMEPGELIKSPAFSS\n+>Streptococcus_suis|ORF4 length 293 aa, 882 bp, from 2756..3637 of Streptococcus_suis\n+MTLYILANPNAGSHTAEHIIFKIKESYPQLAVNIFMTVGPEDEKSQIEAILKEFVSSEDQ\n+LMILGGDGTLSKALRFWPASLPFAYYPTGSGNDFAKAMNITSLYRSVDAILERKTSRIYV\n+LNSSYGTVVNSMDFGFAAQVINGSTNSILKKILNKVKLGKLTYLFFGIKTLFSKQAINLE\n+LTLDEKSYQLDNLFFISVANSLYFGGGIMIWPTASAKKKEVDIVYFKNGNFYQRLQSLLA\n+LLTKRHESSHTIQHLTGVDVVLKSKEKLLLQIDGETCTANEVTLTYQERSMYL\n+>Streptococcus_suis|ORF5 length 126 aa, 381 bp, from 3933..4313 of Streptococcus_suis\n+KKEEEMIMKQLAQQIRVLRTAKNLSQDELAEKLYISRQAVSKWENGEATPDIDKLVQLAE\n+IFGVSLDYLVLGKEPEKEIVVEQRGKMNGWEFLNEESKRPLTRGDVVLLIFLAVMLLGGL\n+FIKHYF\n+>Streptococcus_suis|ORF6 length 377 aa, 1134 bp, from 4381..5514 of Streptococcus_suis\n+LESKKNMSLTAGIVGLPNVGKSTLFNAITKAGAEAANYPFATIDPNVGMVEVPDERLQKL\n+TELIIPKKTVPTTFEFTDIAGIVKGASKGEGLGNKFLANIREVDAIVHVVRAFDDENVMR\n+EQGREDAFVDPIADIDTINLELILADLESINKRYARVEKMARTQKDKDSVAEFAVLEKIK\n+PVLEDGKSARTVEFTDEEQKIVKQLFLLTTKPVLYVANVDEDKVADPEAISYVQQIRDFA\n+ATENAEVVVISARAEEEISELDDEDKGEFLEALGLTESGVDKLTRAAYHLLGLGTYFTAG\n+EKEVRAWTFKRGMKAPQCAGIIHSDFEKGFIRAVTMSYDDLMTYGSEKAVKEAGRLREEG\n+KEYVVQDGDIMEFRFNV\n+>Streptococcus_suis|ORF7 length 115 aa, 348 bp, from complement(4450..4797) of Streptococcus_suis\n+VNGINISDWIHKGIFTALFTHDIFIVKGTHNVDNRINFADIGQEFISKSFTFRSTFYDTS\n+NISKFKSRWHCLFRDDEFGQLLQTLIGHFYHADVWINSCERVVCSFCSCLGNCVK\n+>Streptococcus_suis|ORF8 length 115 aa, 348 bp, from complement(4491..4838) of Streptococcus_suis\n+RLLMLSRSAKINSRLMVSISAIGSTKASSRPCSRMTFSSSKARTTWTIASTSRILAKNLF\n+PSPSPLEAPFTIPAISVNSKVVGTVFLGMMSSVNFCRRSSGTSTMPTFGSIVAKG\n+>Streptococcus_suis|ORF9 length 192 aa, 579 bp, from 5663..6241 of Streptococcus_suis\n+GEKMTRLIIGLGNPGDRYFETKHNVGFMLLDKIAKRENVTFNHDKIFQADIATTFIDGEK\n+IYLVKPTTFMNESGKAVHALMTYYGLDATDILVAYDDLDMAVGKIRFRQKGSAGGHNGIK\n+SIVKHIGTQEFDRIKIGIGRPKGKMSVVNHVLSGFDIEDRIEIDLALDKLDKAVNVYLEE\n+DDFDTVMRKFNG\n+>Streptococcus_suis|ORF10 length 1166 aa, 3501 bp, from 6235..9735 of Streptococcus_suis\n+RIMNILDLLHKNKQINQWQSGLNQSTRQLLLGLSGTSKSLIMATAYDCLAEKIMIVTATQ\n+NDAEKLVADLTAIIGSENVYNFFTDDSPIAEFVFASKERTQSRIDSLNFLTDSTSSGILV\n+ASIVACRVLLPSPETYKGSKIQLEVGQEIEVDKLVKNLVNIGYKKVSRVLTQGEFSQRGD\n+ILDIFDMQSETPYRIEFFGDEIDGIRIFDVDSQKSLENLDEISISPASDIILSSEDYSRA\n+SQYIQTAIEQSTLEEQQSYLREVLADMQTEYRHPDLRKFLSCIYEQSWTLLDYLPKSSPL\n+FLDDFHKIADKQAQFEKEIADLLTDDLQKGKTVSSLKYFASTYAELRKYKPATFFSSFQK\n+GLGNVKFDALYQFTQHPMQEFFHQIPLLKDELTRYAKSNNTVVIQASSDVSLQTLQKNLQ\n+EYDIHLPVHAADKLVEGQQQVTIGQLASGFHLMDEKLVFITEKEIFNKKMKRKTRRTNIS\n+NAERIKDYSELAVGDYVVHHVHGIGQYLGIETIEISGIHRDYLTVQYQNSDRISIPVEQI\n+DLLSKYLASDGKAPKVNKLNDGRFQRTKQKVQKQVEDIADDLIKLYAERSQLKGFAFSPD\n+DENQVEFDNYFTHVETDDQLRSIDEIKKDMEKDSPMDRLLVGDVGFGKTEVAMRAAFKAV\n+NDGKQVAILVPTTVLAQQHYANFQERFAEFPVNVDVMSRFKTKAEQEKTLEKLKKGQVDI\n+LIGTHRLLSKDVVFADLGLLVIDEEQRFGVKHKERLKELKKKIDVLTLTATPIPRTLQMS\n+MLGIRDLSVIETPPTNRYP'..b'\n+DTDTVMYSIIALMTITYIVNRMMSGTQSSRNVMIISQKSEEIKDYITKVADRGVTELPII\n+GGFTGVDKRMLMTTISIPEMQKLETAVLEIDETAFMVVMPASQVRGRGFSLQKDHKHYDE\n+DILIPM\n+>Streptococcus_suis|ORF2902 length 565 aa, 1698 bp, from 1998923..2000620 of Streptococcus_suis\n+FQCNSLKIQVLSSTIKLIDRNRGETMLTVSDVSLRFSDRKLFDDVNIKFTAGNTYGLIGA\n+NGAGKSTFLKILAGDIEPSTGHISLGPDERLSVLRQNHFDYEDERVIDVVIMGNEQLYSI\n+MKEKDAIYMKEDFSDEDGVRAAELEGEFAELGGWEAESEASQLLQNLNISEDLHYQNMSE\n+LTNGEKVKVLLAKALFGKPDVLLLDEPTNGLDIQSINWLEDFLIDFENTVIVVSHDRHFL\n+NKVCTHMADLDFGKIKIFVGNYDFWKQSSELAAKLQADRNAKAEEKIKELQEFVARFSAN\n+ASKSKQATSRKKMLDKIELEEIIPSSRKYPFINFKSEREIGNDLLTVENLKVVIDGETIL\n+DNISFILRPGDKTALIGQNDIQTTALIRALMGDIEYEGTVKWGVTTSQSYLPKDNTRDFD\n+TNESILDWLRQFASKEEDDNTFLRGFLGRMLFSGDEVNKPVNVLSGGEKVRVMLSKLMLL\n+KSNVLVLDDPTNHLDLESISSLNDGLKAFKESIIFASHDHEFIQTLANHIIVISKNGVID\n+RIDETYDEFLENAEVQAKVQELWKA\n+>Streptococcus_suis|ORF2903 length 115 aa, 348 bp, from complement(1999705..2000052) of Streptococcus_suis\n+PIRAVLSPGRRIKLILSRIVSPSITTFKFSTVKRSLPISRSDLKLINGYLRLEGMISSNS\n+ILSNIFLREVACLDLEALAEKRATNSCSSLIFSSAFALRSACSLAASSLDCFQKS\n+>Streptococcus_suis|ORF2904 length 110 aa, 333 bp, from 1999974..2000306 of Streptococcus_suis\n+KLLLMVKRFLTISALSCAQVTRLLLLVKTTSKQLLSFVLLWAILNMKVLSSGVSLLVNPT\n+YQKTILVTLIQTNLSLIGSVNLPARKKMTIPSCAVSWDVCSSRVMRLTNL\n+>Streptococcus_suis|ORF2905 length 117 aa, 354 bp, from 2000502..2000855 of Streptococcus_suis\n+QTISSSFLKTVLSTESTKLMMNSWKMLKYKQKYKNFGKHNKKRLGLLPSLSSQSSCQHLS\n+AVVDCQICSCFTLQIWPLRLLRTKFALSPTSNCLPDSLSCAGVGVKQSGNRLFQLNN\n+>Streptococcus_suis|ORF2906 length 872 aa, 2619 bp, from 2000888..2003506 of Streptococcus_suis\n+PVKFFPTSFSFKSMKKIFTKTSIYYLLSFLIPLTIISIVLAFQGIWWGSDTTILASDGFH\n+QYVIFNQTLRNTLHGDGSLFYTFSSGLGLNFYALSSYYLGSFLSPIVFFFDLQSMPDAIY\n+LVTIVKFGLTGLSTYFSLKGIHKNLKEEWALLLATSFSLMSFSTSQLEINNWLDVFILLP\n+LVLLGLHRLLKKQGPILYYITLTCLFIQNYYFGYMVAIFLTLWTLVQLSWIDSQRIKRFI\n+NFTIVSILSALSSMFMLLPTYLDLKTHGETFTKIVNLKTEDSWYLDFFAKNLVGSFDTTK\n+FGSIPMISVGLVPLILALLFFTLKEIKPTVKLSYALFFTFIISSFYLQPLNLFWQGMHAP\n+NMFLYRYAWALSITVIYLAAETLVRLRQVSIKNFTLIVSFLLICFTSTFIFRDHYEFLTD\n+VNFLLTLEFLIAYFILFVAMIRYKSSLKWINIVLLFFTFLELGLHSHYQVQGISDEWHFP\n+SRSNYEEKLTDIDSIVKSTKTTTDSFYRIERLLPQTGNDSMKFNYNGISQFSSIRNRASS\n+SVLDKLGFRSDGTNLNLRYQNNTIIADSLFGVKYNLATTDPNKFGFTLNQSQSTINLYEN\n+SFNLGLALLTEGIYKDVNFTNLTLDNQTNFLNQLTGLSQKYYHTLSDVVSQNTVELSNRM\n+TVNKVDNEDAAKATFLVNIPANSQVYLNLPNLTFSNENQKKVVITVNNQSSEFTLDNAFS\n+FFNVGSFTTDVQVQVNVYFPENNQVSFDKPQFYRLDLLAFQQAISILQEKQVVTKTDGNK\n+VTVDFVTDKESSLLLTLPYDKGWNATIDGKPIKIQKAQDGFMKVDVSPGQTKLVLTFVPN\n+GFYLGLLISFGAVFVFFSYQFIGYYYSKNREY\n+>Streptococcus_suis|ORF2907 length 235 aa, 708 bp, from complement(2003907..2004614) of Streptococcus_suis\n+FHVKQGVKMNQKEYRVFEGLRIACSLTFISGYLNAFTFVTQGGRFAGVQSGNVISLAYFL\n+AKGDFAQVVNFSIPILFFVFGQFFTYLARRYFEKQTWSWHFGSSVMMLVLILLTIILSPI\n+MPASFTIASLAFVASIQVETFRRLRGAPYANVMMTGNVKNAAYLWFKGVIEKDSELRKTG\n+RNILLTIIGFMLGVIISTHLSFQFEEYALIGLILPVLYINYELWQEKRPTRGRSK\n+>Streptococcus_suis|ORF2908 length 180 aa, 543 bp, from complement(2004615..2005157) of Streptococcus_suis\n+PYPDFLKIFSVVCLWICYNYFMKIKLITVGKLKEKYLKEGIAEYSKRLGRFTKLDMIELP\n+DEKTPDKASQAENEQILKKEADRIMSKIGERDFVIALAIEGKQFPSEEFSQRISDIAVNG\n+YSDITFIIGGSLGLDSCIKKRANLLMSFGQLTLPHQLMKLVLIEQIYRAFMIQQGSPYHK\n+>Streptococcus_suis|ORF2909 length 413 aa, 1242 bp, from 2005223..2006464 of Streptococcus_suis\n+VIIKKEIVLLRKIKEMERIPYMKKYLKFAILFVIGFFGGLIGALSASFFQPQVQQANSAI\n+TSVSNVQYNNETSTTKAVEKVQNAVVSVINYQKSANNSLGVIFGNIESSDELAVAGEGSG\n+VIYKKYGQYAYIVTNTHVINNAEKIDILLASGEKISGELVGSDTYSDIAVIKISADKVTA\n+VAEFADSDTIKVGETAIAIGSPLGSVYANTVTQGIISSLSRTVTSQSKDGQTISTNAIQT\n+DTAINPGNSGGPLINTQGQVIGITSSKITSSSANSSGVAVEGLGFAIPANDAVAIINQLE\n+KTGQVSRPALGVHMVNLTTLSTSQLEKAGLSNTELTSGVVIVSTQSGLPADGKLETFDVI\n+TEIDGEAIQNKSDLQSALYKHQIGDTITVTYYRNNQKQTVDIKLTHSTEELSE\n+>Streptococcus_suis|ORF2910 length 256 aa, 771 bp, from 2006519..2007289 of Streptococcus_suis\n+GYMEELRTLNISEIHPNPYQPRIHFDEKELLELAQSIKENGLIQPIIVRKSSIIGYELLA\n+GERRLRASQLAGLTTIPAVVKELTDDDLLYQAIIENLQRSNLNPIEEAASYQKLISRGLT\n+HDEVAQIMGKSRPYISNLLRLLNLSSQTKQAVEEGKISQGHARQLVSFSEEKQAEWVQLI\n+LSKDLSVRTLEKLIAANKKKHTKLKQRDQFLKEQEDSLSKTLGTATKIIKKKNGSGEIRI\n+SFNDLDEFERIINNFK\n'
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/get_orf_input.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.fasta Mon Jul 29 09:30:44 2013 -0400
b
@@ -0,0 +1,17 @@
+>alpha three forward CDS using table 1
+AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+NNNNNNNNNNNNNNNNATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNN
+AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+NNNNNNNNNNNNNNNNNTAANNTAGMNTGANNNNNNNNNNNNNNNNNNNNN
+>beta three forward CDS using table 11
+AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+NNNNNNNNNNNNNNNNNGTGNATANATTNNNNNNNNNNNNNNNNNNNNNNN
+AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+NNNNNNNNNNNNNNNNNNTAANNTAGNNTGANNNNNNNNNNNNNNNNNNNN
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/get_orf_input.t11_nuc_out.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.t11_nuc_out.fasta Mon Jul 29 09:30:44 2013 -0400
b
@@ -0,0 +1,36 @@
+>alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
+ATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTN
+NNNNNNNNNNNNNNNNTAANNTAG
+>alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
+ATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNN
+NNNNNNNNNNNNTAA
+>alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
+ATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNN
+NNNNNNNNTAANNTAGMNTGA
+>beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11
+GTGNATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNN
+NNNNNNNNNNNNNNNNTAANNTAG
+>beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11
+ATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNN
+NNNNNNNNNNNNTAA
+>beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11
+ATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNNN
+NNNNNNNNTAANNTAGNNTGA
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/get_orf_input.t11_open_nuc_out.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.t11_open_nuc_out.fasta Mon Jul 29 09:30:44 2013 -0400
b
@@ -0,0 +1,39 @@
+>alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
+ATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTN
+NNNNNNNNNNNNNNNNTAANNTAG
+>alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
+ATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNN
+NNNNNNNNNNNNTAA
+>alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
+ATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNN
+NNNNNNNNTAANNTAGMNTGA
+>beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11
+GTGNATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNN
+NNNNNNNNNNNNNNNNTAANNTAG
+>beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11
+ATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNN
+NNNNNNNNNNNNTAA
+>beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11
+ATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGT
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNNN
+NNNNNNNNTAANNTAGNNTGA
+>beta|CDS4 length 25 aa, 75 bp, from 334..408 of beta three forward CDS using table 11
+NTGANNNNNNNNNNNNNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+TTTTTTTTTTTTTTT
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/get_orf_input.t11_open_prot_out.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.t11_open_prot_out.fasta Mon Jul 29 09:30:44 2013 -0400
b
@@ -0,0 +1,20 @@
+>alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
+MXXXXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGG
+GGGFFFFFFFFFFFFFFFFXXXXXXXX
+>alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
+MXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGGG
+GVFFFFFFFFFFFFFFFFXXXXXX
+>alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
+MXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG
+FFFFFFFFFFFFFFFFFXXXXXXXXX
+>beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11
+MXXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGG
+GGVFFFFFFFFFFFFFFFFXXXXXXXX
+>beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11
+MXXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGG
+GFFFFFFFFFFFFFFFFFXXXXXX
+>beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11
+MXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG
+FFFFFFFFFFFFFFFFXXXXXXXXXX
+>beta|CDS4 length 25 aa, 75 bp, from 334..408 of beta three forward CDS using table 11
+MXXXXXXXFFFFFFFFFFFFFFFFF
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/get_orf_input.t11_prot_out.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.t11_prot_out.fasta Mon Jul 29 09:30:44 2013 -0400
b
@@ -0,0 +1,18 @@
+>alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
+MXXXXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGG
+GGGFFFFFFFFFFFFFFFFXXXXXXXX
+>alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
+MXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGGG
+GVFFFFFFFFFFFFFFFFXXXXXX
+>alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
+MXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG
+FFFFFFFFFFFFFFFFFXXXXXXXXX
+>beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11
+MXXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGG
+GGVFFFFFFFFFFFFFFFFXXXXXXXX
+>beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11
+MXXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGG
+GFFFFFFFFFFFFFFFFFXXXXXX
+>beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11
+MXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG
+FFFFFFFFFFFFFFFFXXXXXXXXXX
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/get_orf_input.t1_nuc_out.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.t1_nuc_out.fasta Mon Jul 29 09:30:44 2013 -0400
b
@@ -0,0 +1,18 @@
+>alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
+ATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTN
+NNNNNNNNNNNNNNNNTAANNTAG
+>alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
+ATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNN
+NNNNNNNNNNNNTAA
+>alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
+ATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+AAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
+CCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNN
+NNNNNNNNTAANNTAGMNTGA
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/get_orf_input.t1_prot_out.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.t1_prot_out.fasta Mon Jul 29 09:30:44 2013 -0400
b
@@ -0,0 +1,9 @@
+>alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1
+MXXXXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGG
+GGGFFFFFFFFFFFFFFFFXXXXXXXX
+>alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1
+MXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGGG
+GVFFFFFFFFFFFFFFFFXXXXXX
+>alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1
+MXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG
+FFFFFFFFFFFFFFFFFXXXXXXXXX
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/sanger-pairs-forward.fastq
--- a/test-data/sanger-pairs-forward.fastq Mon Jul 29 09:28:55 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
b"@@ -1,288 +0,0 @@\n-@WTSI_1055_1a04.p1kpIBF bases 1 to 186\n-TTACCCGTCGGCGCCGAAAGAGCCGAAGGCTTTGTGACTGAGGCCGGACACTGTGCTGTTAAGCTGGACATTGCCCGACCTGTCGAGTGCGCCGCTCGCCGAAATTCGTTATCGCGTAAATTTATTTATTTATTTTTATTTTTTTAAATAAAAATGACGACTAATTTGTAAGGGCATAACAACAA\n-+WTSI_1055_1a04.p1kpIBF bases 1 to 186\n-!,,,./644,,,-0377<:Q777<BB<<60,+.,+,.4.,))))//15>>550007:66>>==7@71/--0:<CDBB;;49/***/***22,/+)))11===798:3.,,1488?133??BKKMODFB?BDB7447B?:8--.E:F?B77?BKKC<<322B:..<41,46>>B<<::::5116..\n-@WTSI_1055_1a05.p1kpIBF bases 1 to 642\n-CGTGCCAGTTCTAAACTGGTCGTTCAGCGCCAACCGAAGTGCATACCCTGACGAGCATACACGCAGCTGAAGCGCTCCACAAGCAGCTCTCACCACTAGTCCACGCACCACCCCGCAAGGAGACGGCACGCAGCCACGGGCAAAAGCCGCCTGTTTCACACAACAGCCCGGCTGACCCGACCTTTAGAGCCAATTCTTTTCCCGAAGTTACGAATCTAATTTGCCGACTTCCCTTACCTACATTATTCTATCGACTAGAGGCTGTTCACCTTGGAGACCTGCTGCGGATATCGGTACGATCAGGCAGGAGATTCATATCGCTTCCCTCGCATTTTCAAGGGCCGTGTGGAGCGCACGAGACACCACAGGAACCGCGGTGCTTTACGGGCGCAACATCCCTATCTCAGGCTGAGCCACTTCCAGGCACGCACGCCCTAAACCAGAAAAGAGAACTCTGGCTCGGACTCCACACGACGTCTGCGAGTTCATTTGCGTTACCGCGCGAAACAGTTCTTGCGAACCGTCATTTCCCTGGCCTGGCGTGGGAATGTTAACCCACTTCCCTTTCGGCAACCGGATGGACAAACTGCGCAAGCACAGCAAAGTCTTCATCCGTAGTGTGTGACGGCATTAGCCGGTGC\n-+WTSI_1055_1a05.p1kpIBF bases 1 to 642\n-!<>AIHHCCCCCCCCIIIINNNNNTTTYYYYYYYYYYTTTTIIIIHHNIIIFDKFDDINNNTTTNIIIIINTTTTTTTYYYYYYTNNNNNTTYNIIIIIINNYYYYYYYYYYYYYYYYYTNNNNNTTTTTTYYYYYYYYYYYYYYYYYTLLJJJNNTTTTYYYYYYYYYTNNJNJLLTYYYYTONJJJOOYYYYYYYYYYYYYTTTTLOJJJJOOYYYYYYYYYTTTTTTYYYTTTTTTYYYYYYYYYYYYYYYYLJJJJJTYYYTLLLTOTJJJJJKKOYYYYTJNJJJOOTOOIIIILKYYYYTINDDDEEOSYYYYYYYYYYYYYYYYYYYYYYTTLTTTTTTTINIIIOYTKB888>>KMYYIIFIIITKYYYYKKKTOTYYYYYYYYYYYYYYYYYYYKIDDDD>>444>BKLKIIGGDIOYYYYIYYYQIIII@@7507>43--/<<IAAIIII>559==A@IIB>>===KMQM??/33?BIIQQIIFCCFCCFIIICIHA?@F>:>:>>=3...08AIIIMIQQQQCCCCQC:>=:6:>:>>IICA>>>>IFCCC>:>AA>99>;>AACAA>>>::7;7AIII>>>:>>IAI>833688949>@C>:>A;98777=;>99::>4755057132+\n-@WTSI_1055_1a09.p1kpIBF bases 1 to 497\n-CGAGCTCGGTACCCGGGGATCCCACCGTTTGGAGGGTGAATTCGCGCTGGAAAAAGGTTTTCCATGCAAAAAATGGAACTTCTTCAGCGTCCAAAGCTTTAGTCAGCCAGCAAAGTGTTGGCATTTCATCGAATGGAAATGGTTCAATAAGTAGCGGCAGCCCCAACGTTTTTGAGAAGTTTTGTGGCGTTTTCTCTGAAGGGGTAAAGTCAGGCGAATTGCTGGAAAAGGTGCCATTGGGTGATTTGGAAGTTGTTCTGTTGATGAACCTTTCATGTTCTAGGCGTTTGTGAAGGAATTTGCTGACAATTTGCTCCGAATCCAAAAGGACGTTGAGCGCTGTGATCGGACCATCAAATTCTATTCCAAACGGGACAATTTGGATGCTCTCCAACGGATAATTTGCACTTACATTTATCGTCGGCTGAAGTTGGACATTGAGGACGGTTACGTGCAGGGAATGTGCGATTTGGTCGCTCCTCTTTTGGTGGTGTTT\n-+WTSI_1055_1a09.p1kpIBF bases 1 to 497\n-!989>>CCCCCCIIYOICCCCOIYHHA8339>><@75.444N@IDHHHDDNTTYYYYYTTTIIIIINYYYYYTTTTTTTNNNHHHIHHIIIIOQIDKDDDFHIIITYYYYYYYTTTTTYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTTNNNNNNTTTYYYYYYYTTTTYTYYYSSSYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOOKJJNOTTYYYYYYYTTTTTTTTTTYYYYYYYYTTTTTTYYYYYYTTNNLLLLLLYYMOKKKOYYYYYYYYYYYYYYYYYYTTTTTIIIIIITYYLIIIIIFFDDDFYYYYYYYTTTTTTYYYYYYQQMMMYYTTOKKKIIIIIIIKKNNNDDDNNNNTYTTOOKKKINNIIKQONN?N2::NHTQOKKKKFFFFFFMMIIIICBAAIII>>>>>>AAAB=?FBO>88+,+//><IIII<33/++/0<<4\n-@WTSI_1055_1a10.p1kpIBF bases 1 to 512\n-AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACGGCAAGAGACCAATCTGGTTTTGCAATGTAACATGCCAATTAATCATCAGCATTTTTCACATAAGTGATGGGATGACGGTTGGGGGGGGGGGAAATAAATGCATGTCGATCAGTGCATAGAAGCGAAAGAAATCGTAGAAATTTGCAGATGAAAATTTTGCAGTGGTAATTTGACCGTACCGAAAAGGAATGAGAGCTATTTACCTGTGGGAATGGGTGTAAAATGGAAACTAAATTGCGCGAGGGACAGTTTTGATTGGACGATATCTCCAGCGCAAAGGTCACATGACCAGCCGCTTGGAGATTGTTCGGGTAAGCGAGACAAAATACGAACAATCGGAGTTATTTGTACAACAACAACACATTGATTAAGTGATGGGAGAAAAAAAAAAGAAGGAATAATATGGCTTTGTGCATTTTTCTAAAGGTCTTAAAAATCAA\n-+WTSI_1055_1a10.p1kpIBF bases 1 to 512\n-!.6<:::60.1441+21441++AAAAEHHHHHHHHHHDBB4+,+<<IDCCCCCCCCCHITIIDDDCOOQH@@//)))059><10''*45EHMOFEDCCCCCDIIINTTIINNNNTTTTTTYYYYTIIIDDDDDHHHHHNOKKKKMOOTINNNNNYYYYQPPPPKKKLOKMMMKIINIIIKIIIIIFKIIIITOYSSYYYYYTTTLOKKKKKYYYYYYKLMMOOMSSYSLOKKFBBBFKKKSSYYYSSMMSSYYYSSSSMSSSSSMYYYYMOKKKKSSYYSKKKKKKSYYYPSSSSMMFIIOJJSYYYSSSMLOLIIIIIIYYYLLTLOIIIFFFKKMYYYYYYYYYTTTTTOOKKIINNNNTYYYYYYOFFFFFFIOYYYYYYYYYYQQKKKKKMMTTTTYYYIIIFFFFFFFDMMQQYYKKKKKKMKKKQQYQOKKKMOYYYA;777;>CIIIH@>>CA=94++,69ICCCC@>>743323::@@BIMII"..b"ATCTTTCGCCACTTCCCGCCTCCCCCCCCCCTTTTGACCACCTGCCATTGTTGTCGTTGAGCAACCGAATTTGACTCTTCACCCGTCGACTGCTGGGCGTTCGCTGTTCCGCCATGAATTGGCGCCATTCTCTTTGGCCCTAAAAGTGAACCGGTTACCAACTACTAAAGTGTCCGATTCGCTCCCGAACCTGCCGAGTCTGGACAGAGGCCGGAATTTTTGGGAATGCCATCAATCCCGGAGCATTTTTGAAGCTGCTCTCGACATGAGTACCGGCTCCATTAAAATTATCCCCCTCCAAACCGACCACAATCACACGCCCCCACTCGTCCCTGCGCAACGTCGTCTCTTCGTCGTCCACCTCCGCCTCGTCCGTTCTCGCCCATTCCCTTTTCTCGTC\n-+WTSI_1055_1f20.p1kpIBF bases 1 to 491\n-!89><<<536::6001:41--<A?>CCCFCDDDDIIIYQKKGGNNNDCCCCCDDDDDHTNIDDDDTTIDA>9449;>@DHHHHHINNNNHDEEFHHNNNIIIIIIYYTIIIIITYYYYTTTTNNTTTTTIIIIIIFF>>2...@NNNTTTTTYYYYYYYYYYTTTTTNNLTTNYYYTTNNNLLLTTTTTTTTTTTTTTTYYYYTTTTTTYTNNNNILLNNNNNNNTTTTTTTTTTTTTTTYYYTTLTTTTTTTTTTTTTYYYYYYYYYYYYYYYYYTTTTTTYYTTTTTTYYTTNNNNIIKYYKKTTTTYYYYTTTTTTYYYYYYYYYYKKKKKKYYYYYTTTTTTYYYYIIIIBB=>7<<>>>CII??36-1(((()*+48ACIAA?4/)))'/***,++,539<>>>>>BD777777>>>>>>>>91/))01<::8=891,*117444,+,12777.,+44>440/0977-//-10++048:30---+\n-@WTSI_1055_1f21.p1kpIBF bases 1 to 456\n-TAAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACCTGGAGCAAACTGGTTGTGTCGTGGTCAGGGTACCGCCATTCCGTGAGATATGGTAGGTAAATGCGACCGGGATTATCCACAACTTTGGACGGCCTAATTCGCATACATGGAGTCGGCTTCACATAGCAATAGGGGCCTACGTTGGGATGATTTTCCAGAAAGTAAATGGCTACGGGAATGTTGTACACAGCTCCCTTAAGCTTTATGTATTAAACAAACAAACAAAGACCATACAGCCCACCTTATACAAGATGGGAATGGTCCCCGAAAAGGAAAGGCAATATTTCGGCATTCCGTCAGGGAAAACAAAATTCACAACGTCGGGCTGAAGATCTATAAAATTGTTGAGCGCAGTGAGTAAATCATCCTTCGTACTATCCTC\n-+WTSI_1055_1f21.p1kpIBF bases 1 to 456\n-!.348<<<<<4014:3.08::;<<ECCCIIIHCCBCCCDIYMMKKBNNNHDDDDDDDDDINYOIDDHHTTIDDAA<<<>BDDDDDDDDDIIHHHHIINNNIFDHHHIINIFFIINITTKFFIIIIIIIIIIIIIOOMMQQ8.))*25IHMQQQIIIIIIIIIITNNNNIIKYYYTTTTTTTTTTTTYNNIIIINNTTTTTNNIIITTTTTTTTNNNNTTYYYYYTTTTTTYYYTTTTTNTTTTTYYKKFFKKYYYYYYYYYTTTTTTTTTTTTYTTTTTTYTTTTTTTTLIFDDFJJJFFFIIJLOKFFMSSSYYYYYSSFB;??IIKKKKKKKKKLLKFFDDDDMDDDDB;789;AFNDBB;;BOMMMKKIDDDED@D@@8=@ENEBBBBBD;85//6?@@>77<@DFM?82228>D>>77273BB==97330/.--/8@75-,,/,,,0/53,\n-@WTSI_1055_1f22.p1kpIBF bases 1 to 370\n-CGACCAATGCTCGGTCCGTCACGTAGAGCAATCCGTTTGAGCGATCCACACGAAAATCTTAAGCGCAAAAAAGATTAATATTAATTATTTAACCATCTAATTATTTTAAAAATTTGCCGAAATAGTATCCGATCAAATCGGTTCTGACAATTTTACATTATCTGTTAGCCGTGCCAAAGTCTCTCTCTCACATTCGGTGGCAGCCGGTTGTCGTTGTCCAAGCACAAATTCTACGCTGCCATTATTGCCTTCGTCTCTGTCGCGTGCCAAAAAGCGTCCGATGGCGGTGCCAGCCGGCATATTGTCCAGTAGCCGAATGTGCGTGTCCTGGCGATCCCACAGGATCAGTGGCCGATTATCATTTTTGTC\n-+WTSI_1055_1f22.p1kpIBF bases 1 to 370\n-!89A>887>>:>68>AHHIIDCCCCCDNNYYTTTTTTTYTTTTTTYYYYYYYYYYNNHHHDF=@=>9BQQYYYIIIIIITTTTTTTTTTTTTNTTNNNNNTTTTTTTTTTYYYYYYYYYYTTTTTTYTTYTTTYTNNNNLNNNNNNNTLLYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTTTOOKKKOYYYYYYYYYYYYYYYYYYYYYYTTTTLKKTTTTYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYYYYYYKKMMTTTTTTTYOKIIIGKKYYYYYOIIIOAQ==<:77:<IIIABBBCDO>>988>?FKYYPFBB,,.8>FAA:6698<>>D>>::33:4>>66,,,<<Q93+-\n-@WTSI_1055_1g01.p1kpIBF bases 1 to 584\n-CAAATCCTACTGGCCGGACAAAAGAAGCGGCCAAACAACGTGCTCTTCACAAGACGATCACCACCAAAAACATTCACACATGCTCAACGAGACATTGCTTGCAGGATGGCAAGTGCAGGAAGCACTTTCCGGTGCATTAGTTTACACTGACTATGTAACCTATTGTTAATTCCCTGTAGAAACCGTTTGAGTACGACACTGTGTACTCTGAAAATGCCTACCCTCGCTACAAGCGCCGCCCACCTCCGCCTTCACTCCAAGAAGCCCAGCAGAGTCCGGAATTATACGGGCGCGAAATGCAATACAAGGACCAGCGTGGCAAACTAATTCGCAAGGACAACTCTCACGTCGTGGCTTTCAGTCCATTTCTGTCAAGCAAATATGTCGCTCAGTAAAATTAATACTTTTTGTGACAAAATTGCTAACTTTTTTGCAGCATTAACGTCGAGTTTGTCGCGGGAGAAGGATGTATAAAGTACTTATGCAAGTACATGATGAAAGGAGCGGACATGGCCTTTGTCCAAGTCACGGATGCCAACACGGGCCAAAGTGCGCTGAACTACGACGAACTGCAGCAAATTCG\n-+WTSI_1055_1g01.p1kpIBF bases 1 to 584\n-!333;>HCDHHIIIYIIINTTYYYYTTTTTTYYYYYYNIIIIIININNTONB81+++04HQYTTTTTTTNIIINNTTNTTTTTTTTYYYTTTTTYTTTTTTYYYYYYYYYTTTTTTYYYYYTIIIIIITTTTTTTTTNNNNNNTNNTTTNNNNNNNNNNNNNNNNTTTTTYYTNNJJJJLYYYYYYYYYTTTTTTYTNNNNNNTYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTNNNNNNTTYYYTNNNNNTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYTKKKTNNIIINTYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYTTTTTTOIICBBOQQQQQQC;<88:>>>CIFOYYYYYYQQQQQQQQQCCQQQQHCBAA:AAAAIIA>;A>AAAIC>>AAAACA>>>>III>::>AAACCCIIIA:;==<IIIIIQQAA<:::IA==::8::CQIIIIAA>>CI92\n"
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/sanger-pairs-interleaved.fastq
--- a/test-data/sanger-pairs-interleaved.fastq Mon Jul 29 09:28:55 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
b"@@ -1,576 +0,0 @@\n-@WTSI_1055_1a04.p1kpIBF bases 1 to 186\n-TTACCCGTCGGCGCCGAAAGAGCCGAAGGCTTTGTGACTGAGGCCGGACACTGTGCTGTTAAGCTGGACATTGCCCGACCTGTCGAGTGCGCCGCTCGCCGAAATTCGTTATCGCGTAAATTTATTTATTTATTTTTATTTTTTTAAATAAAAATGACGACTAATTTGTAAGGGCATAACAACAA\n-+WTSI_1055_1a04.p1kpIBF bases 1 to 186\n-!,,,./644,,,-0377<:Q777<BB<<60,+.,+,.4.,))))//15>>550007:66>>==7@71/--0:<CDBB;;49/***/***22,/+)))11===798:3.,,1488?133??BKKMODFB?BDB7447B?:8--.E:F?B77?BKKC<<322B:..<41,46>>B<<::::5116..\n-@WTSI_1055_1a04.q1kpIBR bases 1 to 359\n-TGATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGGTACCCGACGTCCGATATCGCGAAAAATGATGTATCTAGATTTGTCAGGAAACGTCCCCGAGTCTGTTCGACAAACAAACGTTATTCCGAACTCCCAACAACAGTATTTGATTGTGTAAAAATCTCTTGGCCTGATTACTATACTTTAGACATTTTTAGTGCCTGTATTGGAGGTATTTTAGGAACTTTTGGAACGAGCTTTTATCGATTTAGGGAACTAAAAAACCGTTCCATATTCATTAGATGCTATTATTTAAAATCCGAGTCTGATTTGCGAT\n-+WTSI_1055_1a04.q1kpIBR bases 1 to 359\n-!41>;D>AA>;;=;;>>AA@@CDDAA>>>ADINIIHHDD>::79:>>FIICCCHHHHCCCCCCCCCHHHHIEA>9..''))**,,++''+)**.,,,-,00..0B+..33010701+++-1B1.,??KMOYYQQQQ<<61,))01<:CAIIIIIYYYYTYTTTTYYYYYTTTTNNKKKKYYYYYYYYYYYYPMMOKTTTTYTTTTTYNINNINTNTIIIIIIIIINNYYYYYYYTTOLKKKIIIINNNOKKKKKFFKKYYYYYYYYYYSSMMMQMYYYYYTTTTLLPIDDDDDDFFFFFFMMKKLNIDFFKQQMMMMMMMMHHFF>A>>:779=5<488>>7745/00::300+++0-\n-@WTSI_1055_1a05.p1kpIBF bases 1 to 642\n-CGTGCCAGTTCTAAACTGGTCGTTCAGCGCCAACCGAAGTGCATACCCTGACGAGCATACACGCAGCTGAAGCGCTCCACAAGCAGCTCTCACCACTAGTCCACGCACCACCCCGCAAGGAGACGGCACGCAGCCACGGGCAAAAGCCGCCTGTTTCACACAACAGCCCGGCTGACCCGACCTTTAGAGCCAATTCTTTTCCCGAAGTTACGAATCTAATTTGCCGACTTCCCTTACCTACATTATTCTATCGACTAGAGGCTGTTCACCTTGGAGACCTGCTGCGGATATCGGTACGATCAGGCAGGAGATTCATATCGCTTCCCTCGCATTTTCAAGGGCCGTGTGGAGCGCACGAGACACCACAGGAACCGCGGTGCTTTACGGGCGCAACATCCCTATCTCAGGCTGAGCCACTTCCAGGCACGCACGCCCTAAACCAGAAAAGAGAACTCTGGCTCGGACTCCACACGACGTCTGCGAGTTCATTTGCGTTACCGCGCGAAACAGTTCTTGCGAACCGTCATTTCCCTGGCCTGGCGTGGGAATGTTAACCCACTTCCCTTTCGGCAACCGGATGGACAAACTGCGCAAGCACAGCAAAGTCTTCATCCGTAGTGTGTGACGGCATTAGCCGGTGC\n-+WTSI_1055_1a05.p1kpIBF bases 1 to 642\n-!<>AIHHCCCCCCCCIIIINNNNNTTTYYYYYYYYYYTTTTIIIIHHNIIIFDKFDDINNNTTTNIIIIINTTTTTTTYYYYYYTNNNNNTTYNIIIIIINNYYYYYYYYYYYYYYYYYTNNNNNTTTTTTYYYYYYYYYYYYYYYYYTLLJJJNNTTTTYYYYYYYYYTNNJNJLLTYYYYTONJJJOOYYYYYYYYYYYYYTTTTLOJJJJOOYYYYYYYYYTTTTTTYYYTTTTTTYYYYYYYYYYYYYYYYLJJJJJTYYYTLLLTOTJJJJJKKOYYYYTJNJJJOOTOOIIIILKYYYYTINDDDEEOSYYYYYYYYYYYYYYYYYYYYYYTTLTTTTTTTINIIIOYTKB888>>KMYYIIFIIITKYYYYKKKTOTYYYYYYYYYYYYYYYYYYYKIDDDD>>444>BKLKIIGGDIOYYYYIYYYQIIII@@7507>43--/<<IAAIIII>559==A@IIB>>===KMQM??/33?BIIQQIIFCCFCCFIIICIHA?@F>:>:>>=3...08AIIIMIQQQQCCCCQC:>=:6:>:>>IICA>>>>IFCCC>:>AA>99>;>AACAA>>>::7;7AIII>>>:>>IAI>833688949>@C>:>A;98777=;>99::>4755057132+\n-@WTSI_1055_1a05.q1kpIBR bases 1 to 219\n-CTGTGTACAAAGGGCAGGGACGTATTCAGAGCGAGTTGATGACTCGCCCCTACAAGGAATTCCTCGTTCACGGACAATAATTGCAATGTCCGATCCCAATCACGGCAAATTTTCACCGGTTTACCAACCCCTTTCGGGGAAGGACAAGCACGCTGATTTTGCCAGTGTAGCGCGCGTGCAGCCCCGGACATCTAAGGGCATCACAGACCTGTTATTGC\n-+WTSI_1055_1a05.q1kpIBR bases 1 to 219\n-!>>>>>DDIFKOOTTTNDDDHHFTTOOKKKYYTTNNNIYYNNNNNNYTIIIIITIFNIDDKKKNNIIIFIITTTTNNNNNINIINGIKMYYYYYOTTTTTYKKLMMMYYYQOOAAAAIQ;7:<<<A>=AAQA>><<<>7::77::7>>IIIAAAA>:>A=>>5:88::=BIIIIIIIII>>7;9733999=8370---128999::14.,0,,0442+\n-@WTSI_1055_1a09.p1kpIBF bases 1 to 497\n-CGAGCTCGGTACCCGGGGATCCCACCGTTTGGAGGGTGAATTCGCGCTGGAAAAAGGTTTTCCATGCAAAAAATGGAACTTCTTCAGCGTCCAAAGCTTTAGTCAGCCAGCAAAGTGTTGGCATTTCATCGAATGGAAATGGTTCAATAAGTAGCGGCAGCCCCAACGTTTTTGAGAAGTTTTGTGGCGTTTTCTCTGAAGGGGTAAAGTCAGGCGAATTGCTGGAAAAGGTGCCATTGGGTGATTTGGAAGTTGTTCTGTTGATGAACCTTTCATGTTCTAGGCGTTTGTGAAGGAATTTGCTGACAATTTGCTCCGAATCCAAAAGGACGTTGAGCGCTGTGATCGGACCATCAAATTCTATTCCAAACGGGACAATTTGGATGCTCTCCAACGGATAATTTGCACTTACATTTATCGTCGGCTGAAGTTGGACATTGAGGACGGTTACGTGCAGGGAATGTGCGATTTGGTCGCTCCTCTTTTGGTGGTGTTT\n-+WTSI_1055_1a09.p1kpIBF bases 1 to 497\n-!989>>CCCCCCIIYOICCCCOIYHHA8339>><@75.444N@IDHHHDDNTTYYYYYTTTIIIIINYYYYYTTTTTTTNNNHHHIHHIIIIOQIDKDDDFHIIITYYYYYYYTTTTTYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTTNNNNNNTTTYYYYYYYTTTTYTYYYSSSYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOOKJJNOTTYYYYYYYT"..b'IINNHDFKOOOKKMQMMPPYYYTTTTTTYTTTTNNNNNNKFCCQQYYMMFF<<79?A8335:<:6-2+++\n-@WTSI_1055_1f22.p1kpIBF bases 1 to 370\n-CGACCAATGCTCGGTCCGTCACGTAGAGCAATCCGTTTGAGCGATCCACACGAAAATCTTAAGCGCAAAAAAGATTAATATTAATTATTTAACCATCTAATTATTTTAAAAATTTGCCGAAATAGTATCCGATCAAATCGGTTCTGACAATTTTACATTATCTGTTAGCCGTGCCAAAGTCTCTCTCTCACATTCGGTGGCAGCCGGTTGTCGTTGTCCAAGCACAAATTCTACGCTGCCATTATTGCCTTCGTCTCTGTCGCGTGCCAAAAAGCGTCCGATGGCGGTGCCAGCCGGCATATTGTCCAGTAGCCGAATGTGCGTGTCCTGGCGATCCCACAGGATCAGTGGCCGATTATCATTTTTGTC\n-+WTSI_1055_1f22.p1kpIBF bases 1 to 370\n-!89A>887>>:>68>AHHIIDCCCCCDNNYYTTTTTTTYTTTTTTYYYYYYYYYYNNHHHDF=@=>9BQQYYYIIIIIITTTTTTTTTTTTTNTTNNNNNTTTTTTTTTTYYYYYYYYYYTTTTTTYTTYTTTYTNNNNLNNNNNNNTLLYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTTTOOKKKOYYYYYYYYYYYYYYYYYYYYYYTTTTLKKTTTTYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYYYYYYKKMMTTTTTTTYOKIIIGKKYYYYYOIIIOAQ==<:77:<IIIABBBCDO>>988>?FKYYPFBB,,.8>FAA:6698<>>D>>::33:4>>66,,,<<Q93+-\n-@WTSI_1055_1f22.q1kpIBR bases 1 to 496\n-CTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCGCATGAGGAATCGGAAGAGAATAATAACAAGAAAATGACAGATAAAAAGAGTGGAATTGAAGTAGAAGAGAAAAAGGGTAGAGTTGTAACAGAAGAGAAGAAAGTTTTAAATGAAGCGGAAGAAAAGAAGGACGAAGATCAGACGGAAGAGAAGAAAGAAAATGAAAAAGAAGTTAAAAGAAATAATGCGGAAGAGAAGAAGAAATTGGATGAAACTGAAGAGAAGCCGGATGAGGAAAGGGGAGAAAAGAAGAGCAGAGCTGAAGTGGAATTGGAAGAAACAACGAAGAAGAATAATGGACTTAAATATGTTTGGAAGCATCAAAATGAATCGGATGTAAAGAAGTACGAAAACATAATGGAAAGTATGGACGAAAAGAAAATGGAAGAGAAGGAGCTCGTGGACAATTACAGTAATATTTTGTTTGGAA\n-+WTSI_1055_1f22.q1kpIBR bases 1 to 496\n-!399>>>>CHHHHBDDDEIIINNTIIFDA>AAAADDDDDDDDDHHHDDHDIIIIIINNNOOBB+++89DFIKKFFINNTTYYYTTTLLLKKKOOTTOLYLLOLTTTTTTTYYYYYYYYYYYYYTIIIDDDFFKOTYYYYYYYYYYYYYTTTLLJTTTYYYYYYYYYYYYTTTNJJLTTLLTTTTYYYYYYYYYTNNNNNTLLMKNNNNNNTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTLLKKKYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTNNNNJJLNNNNNNNNNTTTTTTNNNNNTYTNNNLNNNTTTTTNNLLTTTTTTTTYYYYYYYYYTTNLLLLLLNNNTLYYYYYYYYYYYYYYYTTTTTTYYYYYYYTNNNNNTTTNNNILOOTINNNNNTTTTMYMMMYIIINFFIIIGINIIIIKLLTOKKKMGGDFFFGFFFFFFFFFNNNIN?CCMQ<<3<<D<<+,.66>>F=;>:5.\n-@WTSI_1055_1g01.p1kpIBF bases 1 to 584\n-CAAATCCTACTGGCCGGACAAAAGAAGCGGCCAAACAACGTGCTCTTCACAAGACGATCACCACCAAAAACATTCACACATGCTCAACGAGACATTGCTTGCAGGATGGCAAGTGCAGGAAGCACTTTCCGGTGCATTAGTTTACACTGACTATGTAACCTATTGTTAATTCCCTGTAGAAACCGTTTGAGTACGACACTGTGTACTCTGAAAATGCCTACCCTCGCTACAAGCGCCGCCCACCTCCGCCTTCACTCCAAGAAGCCCAGCAGAGTCCGGAATTATACGGGCGCGAAATGCAATACAAGGACCAGCGTGGCAAACTAATTCGCAAGGACAACTCTCACGTCGTGGCTTTCAGTCCATTTCTGTCAAGCAAATATGTCGCTCAGTAAAATTAATACTTTTTGTGACAAAATTGCTAACTTTTTTGCAGCATTAACGTCGAGTTTGTCGCGGGAGAAGGATGTATAAAGTACTTATGCAAGTACATGATGAAAGGAGCGGACATGGCCTTTGTCCAAGTCACGGATGCCAACACGGGCCAAAGTGCGCTGAACTACGACGAACTGCAGCAAATTCG\n-+WTSI_1055_1g01.p1kpIBF bases 1 to 584\n-!333;>HCDHHIIIYIIINTTYYYYTTTTTTYYYYYYNIIIIIININNTONB81+++04HQYTTTTTTTNIIINNTTNTTTTTTTTYYYTTTTTYTTTTTTYYYYYYYYYTTTTTTYYYYYTIIIIIITTTTTTTTTNNNNNNTNNTTTNNNNNNNNNNNNNNNNTTTTTYYTNNJJJJLYYYYYYYYYTTTTTTYTNNNNNNTYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTNNNNNNTTYYYTNNNNNTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYTKKKTNNIIINTYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYTTTTTTOIICBBOQQQQQQC;<88:>>>CIFOYYYYYYQQQQQQQQQCCQQQQHCBAA:AAAAIIA>;A>AAAIC>>AAAACA>>>>III>::>AAACCCIIIA:;==<IIIIIQQAA<:::IA==::8::CQIIIIAA>>CI92\n-@WTSI_1055_1g01.q1kpIBR bases 1 to 350\n-TATGACTGATTACGCCAGCTATTTAGGTGAGACTATAGAATACTCACGCTAGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGATTGCTTTTTGGCTCGCATACTGCAGCCTGGGGAAGTAGTTGACGTTTTGAAGAATTGAGGGAAGTTGACGTGAAACGGCAACGCGGAGCAGGTCGGAAATCGCTTCGCTATCAGAGCCAAGCAACGAAATGGCGATTGCGCTTAAAAAACATTGGTTTGCTTAAAACATCAATGGTCTTCACCGGTAGAAGCAGTCGCCTAGACCAACGTTGTTGACGCAACGAATGGTGTTTTGCTGCTGGGCAGACGTGGGCGGAGTGCTA\n-+WTSI_1055_1g01.q1kpIBR bases 1 to 350\n-!..+---77CBI>7---77>>>DACCCHHHIDDDDCCIHHAA84)))%%%))+,32>>HHHHCCCCCCCCCHIIIIINN<B.,,,+++2.22OBNDHHHHHIIDDDDIIYTNNNNNTTTIIIIIITTTTKKYYYYYYYYYYQOB84-,,.<>FIIIIINNNIIIKKMSSSIIIIIIIIIIIILTOOIIIIIFLLLLLLYYSKKLKKKPMSSYSYSSMSS?KKKKFFFIIFKKKKKKKKSMMMSKKIDDDKKKFDDFFFBBDD=DDMMMKDDDDDDKKFFCCKKKKKFFFKKKKFMMMMMKKKKKKKK734:4B<??B@DC=<871<1314/--,,+++++.-5:97--,\n'
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/sanger-pairs-mixed.fastq
--- a/test-data/sanger-pairs-mixed.fastq Mon Jul 29 09:28:55 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
b"@@ -1,800 +0,0 @@\n-@WTSI_1055_1a03.p1kpIBF bases 1 to 312\n-TTGTTGAACAGCAAAAAGGTCAAGAATATGGATGTTCTCGCCATGATTTTTGTGCCATAGGCGCGCATTCACAAGGTCCATCAGTCGNTCAGCCTGCCGCAACACCACCACCAGCCGCAGCAACAACAACAGCACCAGCAGCAGCTGATCCAATCGCATGTGCCACAGAATAACACCCAAAATCAATTAGCGACGGCCGCCCTCCAGCCGGTTCAGCAGCAGAAACAGCACGAAAAATGGGATCCGATCAAAGAATTTGGGCTGCAAAAGGACGAAATGGCGTTGAAGTCACCGCCCAGCAATGTTTGTGT\n-+WTSI_1055_1a03.p1kpIBF bases 1 to 312\n-!96CBHOOTTTYYYQMK???OOTYTTTNNNYYYYNIIIFFIIIIIIIYOOOMAA62.((((*,9@MIIIIO?A3007OOOMMII::%%%::AEHIIIQYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTOOKKKKKYMMYYYKIINNNTYYNIIIINYYYYTOLKKKOOKKKKOLTTYYYYSSSSYYYYSSSSSSMMSOOTLLLONIDDDNOTTYQQMMMMPBB9>BDOOTTQMMMMQMMMQQE:666QQYYPMMDDDADDM@B<FDBBDKKKKKKKKIGKINIFFFKDGGIDB?2/\n-@WTSI_1055_1a04.p1kpIBF bases 1 to 186\n-TTACCCGTCGGCGCCGAAAGAGCCGAAGGCTTTGTGACTGAGGCCGGACACTGTGCTGTTAAGCTGGACATTGCCCGACCTGTCGAGTGCGCCGCTCGCCGAAATTCGTTATCGCGTAAATTTATTTATTTATTTTTATTTTTTTAAATAAAAATGACGACTAATTTGTAAGGGCATAACAACAA\n-+WTSI_1055_1a04.p1kpIBF bases 1 to 186\n-!,,,./644,,,-0377<:Q777<BB<<60,+.,+,.4.,))))//15>>550007:66>>==7@71/--0:<CDBB;;49/***/***22,/+)))11===798:3.,,1488?133??BKKMODFB?BDB7447B?:8--.E:F?B77?BKKC<<322B:..<41,46>>B<<::::5116..\n-@WTSI_1055_1a04.q1kpIBR bases 1 to 359\n-TGATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGGTACCCGACGTCCGATATCGCGAAAAATGATGTATCTAGATTTGTCAGGAAACGTCCCCGAGTCTGTTCGACAAACAAACGTTATTCCGAACTCCCAACAACAGTATTTGATTGTGTAAAAATCTCTTGGCCTGATTACTATACTTTAGACATTTTTAGTGCCTGTATTGGAGGTATTTTAGGAACTTTTGGAACGAGCTTTTATCGATTTAGGGAACTAAAAAACCGTTCCATATTCATTAGATGCTATTATTTAAAATCCGAGTCTGATTTGCGAT\n-+WTSI_1055_1a04.q1kpIBR bases 1 to 359\n-!41>;D>AA>;;=;;>>AA@@CDDAA>>>ADINIIHHDD>::79:>>FIICCCHHHHCCCCCCCCCHHHHIEA>9..''))**,,++''+)**.,,,-,00..0B+..33010701+++-1B1.,??KMOYYQQQQ<<61,))01<:CAIIIIIYYYYTYTTTTYYYYYTTTTNNKKKKYYYYYYYYYYYYPMMOKTTTTYTTTTTYNINNINTNTIIIIIIIIINNYYYYYYYTTOLKKKIIIINNNOKKKKKFFKKYYYYYYYYYYSSMMMQMYYYYYTTTTLLPIDDDDDDFFFFFFMMKKLNIDFFKQQMMMMMMMMHHFF>A>>:779=5<488>>7745/00::300+++0-\n-@WTSI_1055_1a05.p1kpIBF bases 1 to 642\n-CGTGCCAGTTCTAAACTGGTCGTTCAGCGCCAACCGAAGTGCATACCCTGACGAGCATACACGCAGCTGAAGCGCTCCACAAGCAGCTCTCACCACTAGTCCACGCACCACCCCGCAAGGAGACGGCACGCAGCCACGGGCAAAAGCCGCCTGTTTCACACAACAGCCCGGCTGACCCGACCTTTAGAGCCAATTCTTTTCCCGAAGTTACGAATCTAATTTGCCGACTTCCCTTACCTACATTATTCTATCGACTAGAGGCTGTTCACCTTGGAGACCTGCTGCGGATATCGGTACGATCAGGCAGGAGATTCATATCGCTTCCCTCGCATTTTCAAGGGCCGTGTGGAGCGCACGAGACACCACAGGAACCGCGGTGCTTTACGGGCGCAACATCCCTATCTCAGGCTGAGCCACTTCCAGGCACGCACGCCCTAAACCAGAAAAGAGAACTCTGGCTCGGACTCCACACGACGTCTGCGAGTTCATTTGCGTTACCGCGCGAAACAGTTCTTGCGAACCGTCATTTCCCTGGCCTGGCGTGGGAATGTTAACCCACTTCCCTTTCGGCAACCGGATGGACAAACTGCGCAAGCACAGCAAAGTCTTCATCCGTAGTGTGTGACGGCATTAGCCGGTGC\n-+WTSI_1055_1a05.p1kpIBF bases 1 to 642\n-!<>AIHHCCCCCCCCIIIINNNNNTTTYYYYYYYYYYTTTTIIIIHHNIIIFDKFDDINNNTTTNIIIIINTTTTTTTYYYYYYTNNNNNTTYNIIIIIINNYYYYYYYYYYYYYYYYYTNNNNNTTTTTTYYYYYYYYYYYYYYYYYTLLJJJNNTTTTYYYYYYYYYTNNJNJLLTYYYYTONJJJOOYYYYYYYYYYYYYTTTTLOJJJJOOYYYYYYYYYTTTTTTYYYTTTTTTYYYYYYYYYYYYYYYYLJJJJJTYYYTLLLTOTJJJJJKKOYYYYTJNJJJOOTOOIIIILKYYYYTINDDDEEOSYYYYYYYYYYYYYYYYYYYYYYTTLTTTTTTTINIIIOYTKB888>>KMYYIIFIIITKYYYYKKKTOTYYYYYYYYYYYYYYYYYYYKIDDDD>>444>BKLKIIGGDIOYYYYIYYYQIIII@@7507>43--/<<IAAIIII>559==A@IIB>>===KMQM??/33?BIIQQIIFCCFCCFIIICIHA?@F>:>:>>=3...08AIIIMIQQQQCCCCQC:>=:6:>:>>IICA>>>>IFCCC>:>AA>99>;>AACAA>>>::7;7AIII>>>:>>IAI>833688949>@C>:>A;98777=;>99::>4755057132+\n-@WTSI_1055_1a05.q1kpIBR bases 1 to 219\n-CTGTGTACAAAGGGCAGGGACGTATTCAGAGCGAGTTGATGACTCGCCCCTACAAGGAATTCCTCGTTCACGGACAATAATTGCAATGTCCGATCCCAATCACGGCAAATTTTCACCGGTTTACCAACCCCTTTCGGGGAAGGACAAGCACGCTGATTTTGCCAGTGTAGCGCGCGTGCAGCCCCGGACATCTAAGGGCATCACAGACCTGTTATTGC\n-+WTSI_1055_1a05.q1kpIBR bases 1 to 219\n-!>>>>>DDIFKOOTTTNDDDHHFTTOOKKKYYTTNNNIYYNNNNNNYTIIIIITIFNIDDKKKNNIIIFIITTTTNNNNNINIINGIKMYYYYYOTTTTTYKKLMMMYYYQOOAAAAIQ;7:<<<A>=AAQA>><<<>7::77::7>>IIIAAAA>:>A=>>5:88::=BIIIIIIIII>>7;9733999=8370---128999::14.,0,,0442+\n-@WTSI_1055_1a07.p1kpIBF bases 1 to 574\n-AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACCGGTACGGAGGGAAATTTGATCAT"..b'GATGCTTCAACGAAAACTGATCAGGCGAACTGAAAGGGTGTAAAAAAGATAAAAGAAATTGTAAACGCAGCACATTGTCAAGCAAAGCAACCCAAAAAAATCGATTTTGAGTATAGTCAAAAAGGGTTACCCGTCAATGATGATCTGTTGCTGTTTGTTTGATACTCCTCCTTTCAATTTGCGATTGTTGTTGTTGCAATTGGCACGCGAA\n-+WTSI_1055_1f24.q1kpIBR bases 86 to 670\n-!88BHIQQQYYYITTTTIIINNIIIIKKKYYYYIIIIFFYOMTTTYYIIIIAA99//.1<BKKOOTYYYYTTTTNNTTINNNTTYTTNNNIIITTYTTTTTTTTYYYYYIIIIIOYYYYYYYYYYYTTTTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTOTLLYYYYYYYYYTTTTTTTTTTTTTTTTYYYYYYYYYYTTTTTTYYTNNNNNTYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOKKKOOYYYYKK???KQMMMPPPPQMMKKKMPYYYKKKKKKKKKKMMYYYYYYYYYYYYYYYYYYYYYYYYYYYYYQQQQQI51)%%)4<QQQQQQYYYYTTKTTTTTTTYYYYYYYNNNNNNYYYKKKKGGNNNNYYYYYYYYYYYQMMMMQOKKGIIKKKKYQYYYYYYYYTOOLKKIIIIIOYQQQQQQBA>:;AABAACCCIIIOIIBBIIIII:77<><AAIIIOQQIE=>>>CA>AAABBIIIIIII:00882389667>BAAA?A>77:<844>A?;4++0966.+4492000--4922./..++\n-@WTSI_1055_1g01.p1kpIBF bases 1 to 584\n-CAAATCCTACTGGCCGGACAAAAGAAGCGGCCAAACAACGTGCTCTTCACAAGACGATCACCACCAAAAACATTCACACATGCTCAACGAGACATTGCTTGCAGGATGGCAAGTGCAGGAAGCACTTTCCGGTGCATTAGTTTACACTGACTATGTAACCTATTGTTAATTCCCTGTAGAAACCGTTTGAGTACGACACTGTGTACTCTGAAAATGCCTACCCTCGCTACAAGCGCCGCCCACCTCCGCCTTCACTCCAAGAAGCCCAGCAGAGTCCGGAATTATACGGGCGCGAAATGCAATACAAGGACCAGCGTGGCAAACTAATTCGCAAGGACAACTCTCACGTCGTGGCTTTCAGTCCATTTCTGTCAAGCAAATATGTCGCTCAGTAAAATTAATACTTTTTGTGACAAAATTGCTAACTTTTTTGCAGCATTAACGTCGAGTTTGTCGCGGGAGAAGGATGTATAAAGTACTTATGCAAGTACATGATGAAAGGAGCGGACATGGCCTTTGTCCAAGTCACGGATGCCAACACGGGCCAAAGTGCGCTGAACTACGACGAACTGCAGCAAATTCG\n-+WTSI_1055_1g01.p1kpIBF bases 1 to 584\n-!333;>HCDHHIIIYIIINTTYYYYTTTTTTYYYYYYNIIIIIININNTONB81+++04HQYTTTTTTTNIIINNTTNTTTTTTTTYYYTTTTTYTTTTTTYYYYYYYYYTTTTTTYYYYYTIIIIIITTTTTTTTTNNNNNNTNNTTTNNNNNNNNNNNNNNNNTTTTTYYTNNJJJJLYYYYYYYYYTTTTTTYTNNNNNNTYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTNNNNNNTTYYYTNNNNNTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYTKKKTNNIIINTYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYTTTTTTOIICBBOQQQQQQC;<88:>>>CIFOYYYYYYQQQQQQQQQCCQQQQHCBAA:AAAAIIA>;A>AAAIC>>AAAACA>>>>III>::>AAACCCIIIA:;==<IIIIIQQAA<:::IA==::8::CQIIIIAA>>CI92\n-@WTSI_1055_1g01.q1kpIBR bases 1 to 350\n-TATGACTGATTACGCCAGCTATTTAGGTGAGACTATAGAATACTCACGCTAGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGATTGCTTTTTGGCTCGCATACTGCAGCCTGGGGAAGTAGTTGACGTTTTGAAGAATTGAGGGAAGTTGACGTGAAACGGCAACGCGGAGCAGGTCGGAAATCGCTTCGCTATCAGAGCCAAGCAACGAAATGGCGATTGCGCTTAAAAAACATTGGTTTGCTTAAAACATCAATGGTCTTCACCGGTAGAAGCAGTCGCCTAGACCAACGTTGTTGACGCAACGAATGGTGTTTTGCTGCTGGGCAGACGTGGGCGGAGTGCTA\n-+WTSI_1055_1g01.q1kpIBR bases 1 to 350\n-!..+---77CBI>7---77>>>DACCCHHHIDDDDCCIHHAA84)))%%%))+,32>>HHHHCCCCCCCCCHIIIIINN<B.,,,+++2.22OBNDHHHHHIIDDDDIIYTNNNNNTTTIIIIIITTTTKKYYYYYYYYYYQOB84-,,.<>FIIIIINNNIIIKKMSSSIIIIIIIIIIIILTOOIIIIIFLLLLLLYYSKKLKKKPMSSYSYSSMSS?KKKKFFFIIFKKKKKKKKSMMMSKKIDDDKKKFDDFFFBBDD=DDMMMKDDDDDDKKFFCCKKKKKFFFKKKKFMMMMMKKKKKKKK734:4B<??B@DC=<871<1314/--,,+++++.-5:97--,\n-@WTSI_1055_1g02.p1kpIBF bases 1 to 523\n-AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACGACAAATTCACGGAAGCGTCTCGCACTTTGTGCCGAGGACTGCTGCACAAGGAGCCCACTCTGAGGTTGGGCTGTCGCCGGGTCGGCCGGCCTGAGGACGGCGCGGAAGAGCTGAAGGCACACGCGTTCTTCACACAACCGGACCAGAAGACAGGCAGGGAGCCAATTCCGTGGAGGAAGATGGAGGCCGGCAAGGTGGACGACATTCCCTTCTGAACTGCTAGAGAGGACTTGTAGGAATTCCGTCCTTCAGCTGACACCTCCATTTTGTCCGGACCCCCATTCGGTGTATGCCAAAGATGTGCTGGACATCGAGCAGTTCAGCACTGTCAAGGGAGTTCGTCCGCTTCCACCAAACTTTTCCTACCTGCTGAACCATTAGGTTCGACTTGACGCGACTGACAACTCCTTCTACGACAAGTTCAACAGCGGGTCCGTGTCCATACCTTGGC\n-+WTSI_1055_1g02.p1kpIBF bases 1 to 523\n-!08<=AAA:28::87;<::>ACECEIIIIIIIIIIINIKBB>C>QQYNHHHHDDHDHIITIDCCCCOONNNNGDFDDINMINNNNNIHHHHHIINNIIINNNNTYTIIIIDDIIIIYYYTTTTTTYIIIDDDGGITYYSKKKIDNNNNTTNNNNNTYYYTLLLLLLLLLLLYYTYJJJJJNTTTTTTTTTTYYOLLLTTOOOTTTTTTTYNNNNNJJJLLLLLLYYYYYYYYYYSSYYONNNNNNLLTTTTTTTYYYYYYYYYYYYYYYYTMMKKKYYYYYYYYYYYYYTTTTTOOLIILLLLTTLNLLLLLLYYYYYYTTTLLLTTTTTTTYYYYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYNIIIIITYYTTTLTTNIIFFFMYYYYYYYOOLKKOOTIFIFIINTTTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNNNNTYYYYYYYYYYTTTNNNNNNNNTNIIFFFKYYOOOOOIIIA<:77:<<>>>>IOOIHHHDDEIQMMII<924595/4\n'
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/sanger-pairs-reverse.fastq
--- a/test-data/sanger-pairs-reverse.fastq Mon Jul 29 09:28:55 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
b"@@ -1,288 +0,0 @@\n-@WTSI_1055_1a04.q1kpIBR bases 1 to 359\n-TGATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGGTACCCGACGTCCGATATCGCGAAAAATGATGTATCTAGATTTGTCAGGAAACGTCCCCGAGTCTGTTCGACAAACAAACGTTATTCCGAACTCCCAACAACAGTATTTGATTGTGTAAAAATCTCTTGGCCTGATTACTATACTTTAGACATTTTTAGTGCCTGTATTGGAGGTATTTTAGGAACTTTTGGAACGAGCTTTTATCGATTTAGGGAACTAAAAAACCGTTCCATATTCATTAGATGCTATTATTTAAAATCCGAGTCTGATTTGCGAT\n-+WTSI_1055_1a04.q1kpIBR bases 1 to 359\n-!41>;D>AA>;;=;;>>AA@@CDDAA>>>ADINIIHHDD>::79:>>FIICCCHHHHCCCCCCCCCHHHHIEA>9..''))**,,++''+)**.,,,-,00..0B+..33010701+++-1B1.,??KMOYYQQQQ<<61,))01<:CAIIIIIYYYYTYTTTTYYYYYTTTTNNKKKKYYYYYYYYYYYYPMMOKTTTTYTTTTTYNINNINTNTIIIIIIIIINNYYYYYYYTTOLKKKIIIINNNOKKKKKFFKKYYYYYYYYYYSSMMMQMYYYYYTTTTLLPIDDDDDDFFFFFFMMKKLNIDFFKQQMMMMMMMMHHFF>A>>:779=5<488>>7745/00::300+++0-\n-@WTSI_1055_1a05.q1kpIBR bases 1 to 219\n-CTGTGTACAAAGGGCAGGGACGTATTCAGAGCGAGTTGATGACTCGCCCCTACAAGGAATTCCTCGTTCACGGACAATAATTGCAATGTCCGATCCCAATCACGGCAAATTTTCACCGGTTTACCAACCCCTTTCGGGGAAGGACAAGCACGCTGATTTTGCCAGTGTAGCGCGCGTGCAGCCCCGGACATCTAAGGGCATCACAGACCTGTTATTGC\n-+WTSI_1055_1a05.q1kpIBR bases 1 to 219\n-!>>>>>DDIFKOOTTTNDDDHHFTTOOKKKYYTTNNNIYYNNNNNNYTIIIIITIFNIDDKKKNNIIIFIITTTTNNNNNINIINGIKMYYYYYOTTTTTYKKLMMMYYYQOOAAAAIQ;7:<<<A>=AAQA>><<<>7::77::7>>IIIAAAA>:>A=>>5:88::=BIIIIIIIII>>7;9733999=8370---128999::14.,0,,0442+\n-@WTSI_1055_1a09.q1kpIBR bases 1 to 558\n-TGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCACCCAAAAAAAGTTTAAAAATTCGGAATGCGCTGTTTTCTTGGGTAAATATAAAGTAGGGTCCGGATTTATATTGTCTAAAACGCGAATTGACTTAAAAGATTGACCAAAAAAAGCCTAAAGTCCAAACTCTAATCAATAGAATAAAATGTTGGCAGAAATTTACGTCATGCAAAGGGTGTGCCAAATGGTTGATTTTGTGATTTTGATTTAATACAGAGGGTGCGAGATCAACTGAAATTTTGAGTAAATGCCGAGAGACTTTTTGTTTTTCAATTGTAATTTGAAGTTGGCCCTCTCTCCCCCCGACCGACAGTGGTACTCGGATAATCAGCCGAACAAACAAATATTCGTAGTGTTAAACAGAAGGGAAAGATGTAAGGTAACATTGGATTAGTTTGATGATGAGGCACTGAATTAAGGACAACTTGGTTATTATTATACATCCATGTGATTGTGAAGATTAAAGATGTTCTGGGACCAGGATGCCTTTGGAGAGGTTT\n-+WTSI_1055_1a09.q1kpIBR bases 1 to 558\n-!=>>>>>>>DIIIHHDHB99-//66@DIHHHHHHHHHDDCCCCCDHHIIDID@D>C=@KKYYYYKKTIIIIIIYNNIFFFIIMTIDDDDDHHHHDDKFFFIIDDHHHDDDHHHINNINIYYIIONNNINLNNNNNTYYYYYYYTNLLLLLLOOYYYYYYYYYYYYYYYYTTTTTTTTTTTTNNLLLLLLTTNNNJJJNNTTTYYYMMLOOKYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYTTTTTTYYYYLTMTTTTNNNNLLTTTTTTTTTTLLTTTNNNJLLTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTYYTTTTTTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTNNNNNNTONIIINNNNNNNKYYYIOINIIQMOOTNNNNNNNNNTTYYITIIIINNNNNIKKTTTTKTYYYYYYYYYYYYYLF@@@FBC>>=697038<<IIM88+++89I@QAI>::44--344;<><0056699:<9\n-@WTSI_1055_1a10.q1kpIBR bases 1 to 431\n-AAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCACGTAAAAATCGAAAACATAGAAAAAGAAGCAAAGACCGACCGAACCGGTGGGAGAAAGGCTGAATGGGGCATGATTGGGGGAGGGGGGGAAGGTGACGAACCGAACGAATAAATGACAGGACGAGTTTCTCTTTTCTTCTTGGGTTTACATGTGTTGCGTGACCTTCTGAAATGGGCATTCAATGGATGATTGGGAGGGGGGGGGGGGAAGAAGGCCGACCACAGGTTGAATTTCGACTTTCTTCTAATTTTGCCCAACTTTCCCGATGGGGAAGGGCCTATGACCATTCGGTTTTGCAATGAAATCTGCCAATTAAACATCGTCCTTTTTCGTATCTGTGATGGTATGTCGATGGGGTGCG\n-+WTSI_1055_1a10.q1kpIBR bases 1 to 431\n-!9;75;;>;>>ACCCC@CCAADNNNNNIIF>>4::>>FFFDDDHHHHHHHDHDDDHHHHINHIIDD>42-55DFIIIIILYYKIIFIIINNYYYYYYYYYLTINNITTYYYYLONIIIILYYYTIIFIFFIMMSSSYSKKLKKOOTTTYYYYYYYSSSYYYLJJJJJTYYYYYYYLTTTTLYYYTTTMOLYYYYYYLLLNLIJIIIILLLYYTTOLKJJKKKTYYYSGGLLLLNLLKKFMJSSSMPMSSMMMSSYYYYSSMKKKKJJMMPSSMB>,,+++>9DDKKKF@@888F=?DFSK==19/99OFB11,,.,,/,.<E99,,,/9:?FB:0//002613../--,,,,.,,,,,-/0910/+-,0..,++..4+;+++4-,,,4./,//66B?54-,,.,,,,48+++2++,,+,,:6=1859/.,\n-@WTSI_1055_1a11.q1kpIBR bases 1 to 301\n-CGAAGGAAAGGCGGCGGAGAAAGTTTCGTCGTTGGCGGAAAAGCCGATGAAACGCGGGGGACGAACGAAGTTTGTGTTTTTTTTAAAAATCTTTTCTCGACGGTTTCCAGGGAATTGGCCAAGTCCATGGACAAAACCAATGCCAACGGTCCTTCGTCCGCTGATTCATCGACTTCGTGTCCCGGGAGCGCGGAGTGCCGCGGCATCCGCCTCAACAGAAAGGGCGTCAGCCGTCGTCCAACCATCACAGCGCCGTTCCCAAAAGCCGTGCCCCCTCGCGCAGTCGTTCGCCTCCACGGT\n-+WTSI_1055_1a11.q1kpIBR bases 1 to 301\n-!DDDDEIOTNNNNNTFDDHHHITINNNNNNNNNNNNIITYTTNNIILOYYTTTTTYQKDFFFFKOLLFIIIINTTTTYYYYHHAADHSYYYYYYYYYYYYOTTTTNTTTNITTTTTTNNNLJLLNJJKJNLJTTYY"..b'DDDDDDDDDITIIHDAA==8??FFDHHIIFYYYYYYYYYYYYYNNIIIINNTTTYTTTYTOOLYYYIIIILLYYYYKKYYYYYYYONNNNNTYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYNTTTTLYYYYYYJLLJLLYYYYYYYTTTTNNTJNNTLLTTTTTJTTTTTTTTYYYYTTTTTTYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTOOYKKLIIIIIIYLMKKKOOTTTTTYYYTNIIIIITTTYYYYYYYYYYYYYTTTTTTYYYYYYYYYTTTKKKNIIITTYYYYYIIGIGB@=@@FFNNIIKKKMHFFQIIFDDDDDKMKTIIIIOOKIIIIIOOMCBAAAAAQABEHIEIAA::0++1569>>>6///-\n-@WTSI_1055_1f20.q1kpIBR bases 1 to 451\n-AGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCACGAAATGTGTGTGATATTTTAATGAATAAACTTCTTTTTTAATATCATATTAATAATTATTGTATCGTTTTACAACTTTCTATTCATATACTTTTCATCATCATCCCATCCGGTATCACTGCTCCTCCTCCTGCGCCCACCGGCCATCAGTCACTTTCGTGTCATTCCGTCGACAGTGTGGTGGTGGTAGTCAAAATTTGTTGACGGAAAGCCTCCAAAAATTGTTGAAATTGGCCAGCCGTGAGGCCCATTGCCATCCGCGGGTGGCATTTGAACTGTCCGCCCCAGTTTGGTGCCATGGCGGACGCCGCATTCGTCGCGTTGCCAGCCGATCCTCAGCAAAGCCGCTTGGCCCACCGCCGGTGGGCATGTGCCGTTGTCGA\n-+WTSI_1055_1f20.q1kpIBR bases 1 to 451\n-!;>>>>>>>>>DDCC@CCDDDFFIINNNGEA=>@FFFFFHHHHHHHHHDDHDDDDDHHIIDFDDFDEDDKFIIIIINNIFFNNNIIIIIIYNTTTIINIIIIIOHHDDDDNIIIIIIIHHHHHHHHGDFIIFFINHLLLLLNNNNLNNNNNJNNNLLLLNLNNNNNTTTTTNNNNNNYLNNNNLNJJLNTTTYYYTTTMLOYYTTTTTLYYTTTNJNTTTYYYYTTTTTTYYYYYYYSSSONNNNNTYYYYYTTTTTTTTTTYYYYYYYYYYYYLTTOOLFFIOOOOOOOOYTTTTTTTKTTTTYYTNIIIIIITTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYPPPPPQOIGGGNNIIIIT?<5..8A82,+-..140011199>AAAA;;:<<A>>>@BAADDFDIKIIOIBBIIEII>:338:<II@B77/6-20;;IOA@;;91,\n-@WTSI_1055_1f21.q1kpIBR bases 1 to 336\n-GATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAACAAGGATGCGTCTGCTTGTATAACCGGTAATCAAAAATGTGCAAATAATAAAATTGAGTGCATTTACAGGGAAACCGATCGTTGCTGGCGGTATTCATGGACGTGTTTCGGCCACGGGCCGTGGAATTTGGAAAGGGTTGGCGGTCTTCGTCAACGACAAGAACTACATGAGCAAATTGGGACTGACGACTGGATTTAAGGGGAAAACGTTCATCGTCCAAGGATTCGGTTTGTTTAGGGGAAAGGCATTGAAGGGG\n-+WTSI_1055_1f21.q1kpIBR bases 1 to 336\n-!1>;CCCIFCCA>>>>A;>>ADDDDDDDDDDFIIINNNNNDDDDDDFFKIHHHHHHHHDDDDDDDDDDHFFINNKKPPPPOTNNNNIHHHDDDDDDHHHIIINIIIIITYYYYYYYYYYYTNNNNNTYYTTTTTTTTOLLIJJLLNTTYJNNNNNTTTTTNNNNNNYTTTTTNTLLKKYYOTTNNNNNNNTTYYSSPSSSSSSYYYOTOOOYYYYYYYYYTIIIIIITMOKIKNNNNIITNNOLKKMQKKOOTTYQQKKKKLKKKIINNHDFKOOOKKMQMMPPYYYTTTTTTYTTTTNNNNNNKFCCQQYYMMFF<<79?A8335:<:6-2+++\n-@WTSI_1055_1f22.q1kpIBR bases 1 to 496\n-CTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCGCATGAGGAATCGGAAGAGAATAATAACAAGAAAATGACAGATAAAAAGAGTGGAATTGAAGTAGAAGAGAAAAAGGGTAGAGTTGTAACAGAAGAGAAGAAAGTTTTAAATGAAGCGGAAGAAAAGAAGGACGAAGATCAGACGGAAGAGAAGAAAGAAAATGAAAAAGAAGTTAAAAGAAATAATGCGGAAGAGAAGAAGAAATTGGATGAAACTGAAGAGAAGCCGGATGAGGAAAGGGGAGAAAAGAAGAGCAGAGCTGAAGTGGAATTGGAAGAAACAACGAAGAAGAATAATGGACTTAAATATGTTTGGAAGCATCAAAATGAATCGGATGTAAAGAAGTACGAAAACATAATGGAAAGTATGGACGAAAAGAAAATGGAAGAGAAGGAGCTCGTGGACAATTACAGTAATATTTTGTTTGGAA\n-+WTSI_1055_1f22.q1kpIBR bases 1 to 496\n-!399>>>>CHHHHBDDDEIIINNTIIFDA>AAAADDDDDDDDDHHHDDHDIIIIIINNNOOBB+++89DFIKKFFINNTTYYYTTTLLLKKKOOTTOLYLLOLTTTTTTTYYYYYYYYYYYYYTIIIDDDFFKOTYYYYYYYYYYYYYTTTLLJTTTYYYYYYYYYYYYTTTNJJLTTLLTTTTYYYYYYYYYTNNNNNTLLMKNNNNNNTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTLLKKKYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTNNNNJJLNNNNNNNNNTTTTTTNNNNNTYTNNNLNNNTTTTTNNLLTTTTTTTTYYYYYYYYYTTNLLLLLLNNNTLYYYYYYYYYYYYYYYTTTTTTYYYYYYYTNNNNNTTTNNNILOOTINNNNNTTTTMYMMMYIIINFFIIIGINIIIIKLLTOKKKMGGDFFFGFFFFFFFFFNNNIN?CCMQ<<3<<D<<+,.66>>F=;>:5.\n-@WTSI_1055_1g01.q1kpIBR bases 1 to 350\n-TATGACTGATTACGCCAGCTATTTAGGTGAGACTATAGAATACTCACGCTAGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGATTGCTTTTTGGCTCGCATACTGCAGCCTGGGGAAGTAGTTGACGTTTTGAAGAATTGAGGGAAGTTGACGTGAAACGGCAACGCGGAGCAGGTCGGAAATCGCTTCGCTATCAGAGCCAAGCAACGAAATGGCGATTGCGCTTAAAAAACATTGGTTTGCTTAAAACATCAATGGTCTTCACCGGTAGAAGCAGTCGCCTAGACCAACGTTGTTGACGCAACGAATGGTGTTTTGCTGCTGGGCAGACGTGGGCGGAGTGCTA\n-+WTSI_1055_1g01.q1kpIBR bases 1 to 350\n-!..+---77CBI>7---77>>>DACCCHHHIDDDDCCIHHAA84)))%%%))+,32>>HHHHCCCCCCCCCHIIIIINN<B.,,,+++2.22OBNDHHHHHIIDDDDIIYTNNNNNTTTIIIIIITTTTKKYYYYYYYYYYQOB84-,,.<>FIIIIINNNIIIKKMSSSIIIIIIIIIIIILTOOIIIIIFLLLLLLYYSKKLKKKPMSSYSYSSMSS?KKKKFFFIIFKKKKKKKKSMMMSKKIDDDKKKFDDFFFBBDD=DDMMMKDDDDDDKKFFCCKKKKKFFFKKKKFMMMMMKKKKKKKK734:4B<??B@DC=<871<1314/--,,+++++.-5:97--,\n'
b
diff -r 6a14074bc810 -r d51819d2d7e2 test-data/sanger-pairs-singles.fastq
--- a/test-data/sanger-pairs-singles.fastq Mon Jul 29 09:28:55 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
b"@@ -1,224 +0,0 @@\n-@WTSI_1055_1a03.p1kpIBF bases 1 to 312\n-TTGTTGAACAGCAAAAAGGTCAAGAATATGGATGTTCTCGCCATGATTTTTGTGCCATAGGCGCGCATTCACAAGGTCCATCAGTCGNTCAGCCTGCCGCAACACCACCACCAGCCGCAGCAACAACAACAGCACCAGCAGCAGCTGATCCAATCGCATGTGCCACAGAATAACACCCAAAATCAATTAGCGACGGCCGCCCTCCAGCCGGTTCAGCAGCAGAAACAGCACGAAAAATGGGATCCGATCAAAGAATTTGGGCTGCAAAAGGACGAAATGGCGTTGAAGTCACCGCCCAGCAATGTTTGTGT\n-+WTSI_1055_1a03.p1kpIBF bases 1 to 312\n-!96CBHOOTTTYYYQMK???OOTYTTTNNNYYYYNIIIFFIIIIIIIYOOOMAA62.((((*,9@MIIIIO?A3007OOOMMII::%%%::AEHIIIQYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTOOKKKKKYMMYYYKIINNNTYYNIIIINYYYYTOLKKKOOKKKKOLTTYYYYSSSSYYYYSSSSSSMMSOOTLLLONIDDDNOTTYQQMMMMPBB9>BDOOTTQMMMMQMMMQQE:666QQYYPMMDDDADDM@B<FDBBDKKKKKKKKIGKINIFFFKDGGIDB?2/\n-@WTSI_1055_1a07.p1kpIBF bases 1 to 574\n-AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACCGGTACGGAGGGAAATTTGATCATCGCGGAAGTGCTCGTTTTGATTATCTTGGTGTATGGCGTCTGTGACCTTCTTTTTCGCTGGATGGGCATCGGGGCGTACGCCTGGGGTTCGCGCTCGAGCCCCAAAATCGCCCTCACTTTCGATGACGGGCCCAGCGAACACACCCGGTCCTTGCTCGAGCTGCTGCACCGCCATGGGGTAAAAGCTACCTTTTTTGTCACCGGCGTTCAGGCCGAGCGGCACCCCGACTTGCTAGAAGCCCTGCGGGCCGATGGCCATCAGATCGAATCGCACGGCTACTGGCACCGCCAAGCGTTCTTCCTGTGGCCTTGGCAAGAAGCGCGGCACATCCAACGGGTTCCGGGCAAACTATACCGCCCCCCCTATGGAGCCCACTCCCCCTTCACCCGGCTTCTTGCCCGGCTCCACGGCAAAGTGGTGGCGCTATGTGACCTCGAGTCCAAGGACTGGACCGACCGACCTGCCGAAGAACTGGCCG\n-+WTSI_1055_1a07.p1kpIBF bases 1 to 574\n-!>>>AAA:9.4441+35:88;;CHIIIIIIDDDCCCH>Q35-+*46?>CHHHHHHHHIIYOHHHHHTTYTHA72-35>:>DAKHHHQQTTNIIFIGNYNNNNIIIIIINTTYYFFFDDHIINTIIIIIITIIIIIIDDDDDDIIOTNTLIIIKLOYYYYYYYYTTTYYNNNNIIINNNNTIIINNLNIIINYYYYYYSSMMSYYTTMMKLLLNNTTTTTTTTLLKLLYTTJNLJLTYYYLLLLKLLTLLKKKLLTTTTTYYYYYYYTTTTTTTTOOKYLOTTTTYYYYYYTTTTTTYYYYYYYYYYTNNNNNTIFFFIFIIIIIOKKKYKKTOOKYYYYYYYTOOTIIOLKIINNNNNTYYYYYYYYYYYYYYYYYMTTTTTYTTTTIIOIIIQIIMIII:99>AAAIIBBIOOYYYYOKCCDAAFFFIOD@@>>>A<<926<QIQQQQMIIIIFDFFFDDFDDDAA===BGKKKKO943>>@B;BB?:?IMYYQB..+2,448:?88888<877:<>A810))*.12889600<<9411799>83,,,84337:<7227470..---.//+,\n-@WTSI_1055_1a08.p1kpIBF bases 1 to 397\n-TAAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACCTCCGAGAGCACTCGTGACGAATTGATTCCCCTGCTAAGCATCGAATGCGTAAAGTTAGGGCGTGCTCGTCGGCTTTATGAGAAGGGATTTCGCACCGTCGGATCGATAGCGAAAGCGGAGCCTCGCCAACTCATCGAAGCGTTAGGGGGCAAATTGAGCTGTTGCCAGTGCAGGAGGATGATCAGCAGTGCAAAGGTCCCCGGGGGTTGAGTAAATGGTGCTTAAAGGCCCCGTTCCGGAGACGATAAATTTATTCACTTTCAATTAGAGGCTTCAGAAGCTCAAAATTGTTCGAGTTTTTGTTCAGGCGATTATCCGCGATC\n-+WTSI_1055_1a08.p1kpIBF bases 1 to 397\n-!.006<=AA83059:85;<::>CCECIIIIIIIIFIIINIBB1160BBKFDHHHHIIIIIIYOHHHHHTOID?:.-+,*,+.,/5.,*+06:IAA99,,,66??:,++002:0--,,170/442//.44<?33/74323/+****+28;=BBDDB<9...9<:32231644460.1.9/5055@@OB@9552B0492//../1@;99///BBFF11.9444///<BF@=666;@<@66140,,.03;;>>???M::2448HHKKMMMMMPYYOLKKKKYYYYYYYQQMHFKHMKLLKOOYYQMMKFKOOTYTDDDDDDQKKKKKKP?B<FFOIIDIOO?633:?AHII=:77:>IQQ?C?BOOO>=695BBNN1-,88553</..8888,,,425.\n-@WTSI_1055_1a15.p1kpIBF bases 1 to 312\n-GACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACGAAAATGTAATTTTTTTTCTTTTAATTTTGTCAACTTTTTTAGCAAAAGCATTGTATTTTAACTGTATATTGCGTTTTGGAGGCAGTCACTGGATTCAAGGGAGCAGACCAAGAAAAATTTTTACAAAGTTTCTAACCCTTTCAAGGTTTTGGACCAATTTCGTAACAAATTTCGCCAAAAAATGTGCATAATTTCTTTTACCACGCCTATCGGCATCAGTAAGTCGTCCCAGTAAAGCTAATA\n-+WTSI_1055_1a15.p1kpIBF bases 1 to 312\n-!:AA<4+1441+38::4..A<<BHHHCIIIICHI></4++*=:I>AHHFHDHDHIITIDDDDDOOOOMM@=30++,89QQQQOIIIIDDHHHHYTNNNNHHHIIOIIIIFFYYYYYYYNIIIIIIIIIIHHCC>81**'''(*6:IMMOQOIIIFFFIIILNNTTTTYYYYYYTNNNNIIKKKYTTTIIIMKTKTYTIDDDDDDTTNNIKKIIIIOOYFFFFFDIINNADDIIIKKKOOTLIOONHDKDDKKFFAD>AADMMMYOOOOLKKDIIIMKE966<<KB?>B70////2:B1../004.,,,..,\n-@WTSI_1055_1a17.p1kpIBF bases 1 to 201\n-AGAATGCGGAACAGCTGACGCAAATACATGTAGTCAGGCGCCTCGTCAAAGCGCGACCCGCGGCAGTAGTTCAAGTACATGNAGAACTCAAAGGGGAAGCCCTTGCACAGCATTTCCACCGGTGTCGACATTTTCTTAAAATGTTCACAAAAATGAACTTTTAATCGTAAAAAGGAGACCAATTTCGGGAACTTGTATGT\n-+WTSI_1055_1a17.p1kpIBF bases 1 to 201\n-!<CIIIIIIIITHHIHHHNNNTNNNNIIIIIIIIIIITYOIIIIIOMMQ=6+(%(((,.<<QQIFIIFFIIIIIIHEB::%%%45BB64****4IQQQQOOOOOOYYYYYYYYYYYYYYTTTNNNTMOYYYYOTNNNNTTYYTTTTTTYYYMYYYMMYYOOOKKKKKCC???<::B9=BB"..b'TTTGACCCGGGGTCACAAGTCACCATGGTGAAAAGTTCCGTGGTACAAGCTCTAAGCCCGCAGAAAATAAAGGAAGGGCTATTGGAAGTGAGCGGCTTCCACAGCGAAATACCGGTGAAGCTGGACGCCCCACAATATAAACTACAAGTTGCCTTGGCAGACGGCCGTTCGGAGAACCTTGTAGCATATCGGGCCGATTGGATAGTCAGATCCATCCGCAGAGCTGAATGGAGCACCGGGAAAGTGGAAGCAGTTGACGATGAGCCGGATTTGCTCATTGGAATGCCAGAG\n-+WTSI_1055_1f17.p1kpIBF bases 1 to 436\n-!08<;=<:404::4.25:9<>>ECIIIIIIDDDDDHIINIMKKKNNNHDDDDDDDDDIIYNIDDDDTDCCCAA;97699;IIITTTTTTNNNNNIIIIIIIIIIITYYYYTTTNNNTTTYYYYYYYYYYYYTTNNNNOO@8@BEIIITTTYYYYNNNNNTNNNNNNNLLLLLLLLNTYTTTTTTTNNNNNTTTYNNNNNTYYYYTTTTTTYYYYYYYYYTNNNJNOYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTTTTTTTTTTTYTTOMKLYYYYTTTTTIINNNNTYYYYYYTNIFFIIIIYYYYYYYYYYYYYYOOOIINKKKQQYYTTTTTTYYYYYOIDDDDDFTKMKGGINKKINNYKKKKMMMKKKKKKMIHH>>==:?;BAAQ=963;<<<<::;33,4./,591,,\n-@WTSI_1055_1f23.p1kpIBF bases 1 to 383\n-AAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACCCCCTATCCCCGCAGAGGTCCATCCAGGAGTCCCAAGAGCACATGGAGAGCACTTTCAAGGCGTTGCGTCGTCAGCTGCCGGTGACGCGCTCCAAGCTGAACTGGCTGAACTTCCATTCCTTCCGCATCACTCAGCAGGAGATGAAGCAGCCGCCCTCGGCCGGCCAGCAACAACAGTCCCAGTGATGGAGCAGTCCAAGAAGAGGAAGCGAGCGAATTTGGAGCATCGCCCATTCATTTCAATTAATACCTTTCCGATTTGTGTACTTTCCCCGACATTTTCGCCATCCAATTATGGCAAGTGAAAGTTT\n-+WTSI_1055_1f23.p1kpIBF bases 1 to 383\n-!34:<<<<<;289:87;<::>AACCEDIIIFDDHHHINNTYYYKKNNNIIIIHDDDDDIIYNDDDDDTTYYYFDDAAADFKYMIFFDDDDHDDFIFFIIDDHHHITTTYINNIIIKKKOMIHHDHHIYYYYLYINNNNNOKFFFDENNNNNHGGLLLNLNNNNNYNNNLNJTTTNNNJLINNNNTYYYYYYYYYYYYYYNNNNNNYYYTTTTTYTTTTTTYYYYNLIIIIIIIIIYYYYYTTTTTTYYYYYLIIIIIIILTYYYYTTOOKLYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOOIFFIYYYYYYYYYYYYYYYYYYYYYYYYYYYYKKIIIITOYYYQQOIEEAACC>>@=;>5>AAAAAAIB94\n-@WTSI_1055_1f24.q1kpIBR bases 86 to 670\n-TTGGCACGCAAAAGACGCAATTCTTCAGACGGATTTAAATTGGCAAGAATATCGAGCTAAATGGCAAATGTTTAAAATGGTAATCCCGGAGGAAGAAGACCACGGATTTTTTAACAAAAATGTAAATTTATTTCATGAATTTGTTGCAAAAACCAAAAGGTGCCAAAATATTGATTTACGAAAAGCGCTAACTTCTTCAGCCAAATGCCCTCTTCAAACCCACTTGATCAATCGTTGCACTCAGTGCTTTTTGATCGCCATTTTCTCCACGTCAGATTTAACCAGTCAATTTTGTCATTGGCTTCCTTTCAATGCGGTTGCTGCTTCAAAATCATCTCTTCCATTAAATTCGGGTAACGAGCCCAATGTTCTTGATGCTTCAACGAAAACTGATCAGGCGAACTGAAAGGGTGTAAAAAAGATAAAAGAAATTGTAAACGCAGCACATTGTCAAGCAAAGCAACCCAAAAAAATCGATTTTGAGTATAGTCAAAAAGGGTTACCCGTCAATGATGATCTGTTGCTGTTTGTTTGATACTCCTCCTTTCAATTTGCGATTGTTGTTGTTGCAATTGGCACGCGAA\n-+WTSI_1055_1f24.q1kpIBR bases 86 to 670\n-!88BHIQQQYYYITTTTIIINNIIIIKKKYYYYIIIIFFYOMTTTYYIIIIAA99//.1<BKKOOTYYYYTTTTNNTTINNNTTYTTNNNIIITTYTTTTTTTTYYYYYIIIIIOYYYYYYYYYYYTTTTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTOTLLYYYYYYYYYTTTTTTTTTTTTTTTTYYYYYYYYYYTTTTTTYYTNNNNNTYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOKKKOOYYYYKK???KQMMMPPPPQMMKKKMPYYYKKKKKKKKKKMMYYYYYYYYYYYYYYYYYYYYYYYYYYYYYQQQQQI51)%%)4<QQQQQQYYYYTTKTTTTTTTYYYYYYYNNNNNNYYYKKKKGGNNNNYYYYYYYYYYYQMMMMQOKKGIIKKKKYQYYYYYYYYTOOLKKIIIIIOYQQQQQQBA>:;AABAACCCIIIOIIBBIIIII:77<><AAIIIOQQIE=>>>CA>AAABBIIIIIII:00882389667>BAAA?A>77:<844>A?;4++0966.+4492000--4922./..++\n-@WTSI_1055_1g02.p1kpIBF bases 1 to 523\n-AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACGACAAATTCACGGAAGCGTCTCGCACTTTGTGCCGAGGACTGCTGCACAAGGAGCCCACTCTGAGGTTGGGCTGTCGCCGGGTCGGCCGGCCTGAGGACGGCGCGGAAGAGCTGAAGGCACACGCGTTCTTCACACAACCGGACCAGAAGACAGGCAGGGAGCCAATTCCGTGGAGGAAGATGGAGGCCGGCAAGGTGGACGACATTCCCTTCTGAACTGCTAGAGAGGACTTGTAGGAATTCCGTCCTTCAGCTGACACCTCCATTTTGTCCGGACCCCCATTCGGTGTATGCCAAAGATGTGCTGGACATCGAGCAGTTCAGCACTGTCAAGGGAGTTCGTCCGCTTCCACCAAACTTTTCCTACCTGCTGAACCATTAGGTTCGACTTGACGCGACTGACAACTCCTTCTACGACAAGTTCAACAGCGGGTCCGTGTCCATACCTTGGC\n-+WTSI_1055_1g02.p1kpIBF bases 1 to 523\n-!08<=AAA:28::87;<::>ACECEIIIIIIIIIIINIKBB>C>QQYNHHHHDDHDHIITIDCCCCOONNNNGDFDDINMINNNNNIHHHHHIINNIIINNNNTYTIIIIDDIIIIYYYTTTTTTYIIIDDDGGITYYSKKKIDNNNNTTNNNNNTYYYTLLLLLLLLLLLYYTYJJJJJNTTTTTTTTTTYYOLLLTTOOOTTTTTTTYNNNNNJJJLLLLLLYYYYYYYYYYSSYYONNNNNNLLTTTTTTTYYYYYYYYYYYYYYYYTMMKKKYYYYYYYYYYYYYTTTTTOOLIILLLLTTLNLLLLLLYYYYYYTTTLLLTTTTTTTYYYYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYNIIIIITYYTTTLTTNIIFFFMYYYYYYYOOLKKOOTIFIFIINTTTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNNNNTYYYYYYYYYYTTTNNNNNNNNTNIIFFFKYYOOOOOIIIA<:77:<<>>>>IOOIHHHDDEIQMMII<924595/4\n'
b
diff -r 6a14074bc810 -r d51819d2d7e2 tools/fastq/fastq_paired_unpaired.py
--- a/tools/fastq/fastq_paired_unpaired.py Mon Jul 29 09:28:55 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
b'@@ -1,241 +0,0 @@\n-#!/usr/bin/env python\n-"""Divides a FASTQ into paired and single (orphan reads) as separate files.\n-\n-The input file should be a valid FASTQ file which has been sorted so that\n-any partner forward+reverse reads are consecutive. The output files all\n-preserve this sort order. Pairing are recognised based on standard name\n-suffices. See below or run the tool with no arguments for more details.\n-\n-Note that the FASTQ variant is unimportant (Sanger, Solexa, Illumina, or even\n-Color Space should all work equally well).\n-\n-This script is copyright 2010-2013 by Peter Cock, The James Hutton Institute\n-(formerly SCRI), Scotland, UK. All rights reserved.\n-\n-See accompanying text file for licence details (MIT license).\n-"""\n-import os\n-import sys\n-import re\n-from galaxy_utils.sequence.fastq import fastqReader, fastqWriter\n-\n-if "-v" in sys.argv or "--version" in sys.argv:\n-    print "Version 0.0.8"\n-    sys.exit(0)\n-\n-def stop_err(msg, err=1):\n-   sys.stderr.write(msg.rstrip() + "\\n")\n-   sys.exit(err)\n-\n-msg = """Expect either 3 or 4 arguments, all FASTQ filenames.\n-\n-If you want two output files, use four arguments:\n- - FASTQ variant (e.g. sanger, solexa, illumina or cssanger)\n- - Sorted input FASTQ filename,\n- - Output paired FASTQ filename (forward then reverse interleaved),\n- - Output singles FASTQ filename (orphan reads)\n-\n-If you want three output files, use five arguments:\n- - FASTQ variant (e.g. sanger, solexa, illumina or cssanger)\n- - Sorted input FASTQ filename,\n- - Output forward paired FASTQ filename,\n- - Output reverse paired FASTQ filename,\n- - Output singles FASTQ filename (orphan reads)\n-\n-The input file should be a valid FASTQ file which has been sorted so that\n-any partner forward+reverse reads are consecutive. The output files all\n-preserve this sort order.\n-\n-Any reads where the forward/reverse naming suffix used is not recognised\n-are treated as orphan reads. The tool supports the /1 and /2 convention\n-originally used by Illumina, the .f and .r convention, and the Sanger\n-convention (see http://staden.sourceforge.net/manual/pregap4_unix_50.html\n-for details), and the new Illumina convention where the reads have the\n-same identifier with the fragment at the start of the description, e.g.\n-\n-@HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 1:N:0:TGNCCA\n-@HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 2:N:0:TGNCCA \n-\n-Note that this does support multiple forward and reverse reads per template\n-(which is quite common with Sanger sequencing), e.g. this which is sorted\n-alphabetically:\n-\n-WTSI_1055_4p17.p1kapIBF\n-WTSI_1055_4p17.p1kpIBF\n-WTSI_1055_4p17.q1kapIBR\n-WTSI_1055_4p17.q1kpIBR\n-\n-or this where the reads already come in pairs:\n-\n-WTSI_1055_4p17.p1kapIBF\n-WTSI_1055_4p17.q1kapIBR\n-WTSI_1055_4p17.p1kpIBF\n-WTSI_1055_4p17.q1kpIBR\n-\n-both become:\n-\n-WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR\n-WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR\n-"""\n-\n-if len(sys.argv) == 5:\n-    format, input_fastq, pairs_fastq, singles_fastq = sys.argv[1:]\n-elif len(sys.argv) == 6:\n-    pairs_fastq = None\n-    format, input_fastq, pairs_f_fastq, pairs_r_fastq, singles_fastq = sys.argv[1:]\n-else:\n-    stop_err(msg)\n-\n-format = format.replace("fastq", "").lower()\n-if not format:\n-    format="sanger" #safe default\n-elif format not in ["sanger","solexa","illumina","cssanger"]:\n-    stop_err("Unrecognised format %s" % format)\n-\n-def f_match(name):\n-   if name.endswith("/1") or name.endswith(".f"):\n-      return True\n-\n-#Cope with three widely used suffix naming convensions,\n-#Illumina: /1 or /2\n-#Forward/revered: .f or .r\n-#Sanger, e.g. .p1k and .q1k\n-#See http://staden.sourceforge.net/manual/pregap4_unix_50.html\n-re_f = re.compile(r"(/1|\\.f|\\.[sfp]\\d\\w*)$")\n-re_r = re.compile(r"(/2|\\.r|\\.[rq]\\d\\w*)$")\n-\n-#assert re_f.match("demo/1")\n-assert re_f.search("demo.f")\n-assert re_f.search("demo.s1")\n-assert re_f.search("demo.f1k")\n-assert re_f.search("demo.p1")\n-assert re_f.search("demo.p1k")\n-assert re_f.search("'..b'les = 0, 0, 0, 0, 0, 0\n-in_handle = open(input_fastq)\n-if pairs_fastq:\n-    pairs_f_writer = fastqWriter(open(pairs_fastq, "w"), format)\n-    pairs_r_writer = pairs_f_writer\n-else:\n-    pairs_f_writer = fastqWriter(open(pairs_f_fastq, "w"), format)\n-    pairs_r_writer = fastqWriter(open(pairs_r_fastq, "w"), format)\n-singles_writer = fastqWriter(open(singles_fastq, "w"), format)\n-last_template, buffered_reads = None, []\n-\n-for record in fastqReader(in_handle, format):\n-    count += 1\n-    name = record.identifier.split(None,1)[0]\n-    assert name[0]=="@", record.identifier #Quirk of the Galaxy parser\n-    is_forward = False\n-    suffix = re_f.search(name)\n-    if suffix:\n-        #============\n-        #Forward read\n-        #============\n-        template = name[:suffix.start()]\n-        is_forward = True\n-    elif re_illumina_f.match(record.identifier):\n-        template = name #No suffix\n-        is_forward = True\n-    if is_forward:\n-        #print name, "forward", template\n-        forward += 1\n-        if last_template == template:\n-            buffered_reads.append(record)\n-        else:\n-            #Any old buffered reads are orphans\n-            for old in buffered_reads:\n-                singles_writer.write(old)\n-                singles += 1\n-            #Save this read in buffer\n-            buffered_reads = [record]\n-            last_template = template\n-    else:\n-        is_reverse = False\n-        suffix = re_r.search(name)\n-        if suffix:\n-            #============\n-            #Reverse read\n-            #============\n-            template = name[:suffix.start()]\n-            is_reverse = True\n-        elif re_illumina_r.match(record.identifier):\n-            template = name #No suffix\n-            is_reverse = True\n-        if is_reverse:\n-            #print name, "reverse", template\n-            reverse += 1\n-            if last_template == template and buffered_reads:\n-                #We have a pair!\n-                #If there are multiple buffered forward reads, want to pick\n-                #the first one (although we could try and do something more\n-                #clever looking at the suffix to match them up...)\n-                old = buffered_reads.pop(0)\n-                pairs_f_writer.write(old)\n-                pairs_r_writer.write(record)\n-                pairs += 2\n-            else:\n-                #As this is a reverse read, this and any buffered read(s) are\n-                #all orphans\n-                for old in buffered_reads:\n-                    singles_writer.write(old)\n-                    singles += 1\n-                buffered_reads = []\n-                singles_writer.write(record)\n-                singles += 1\n-                last_template = None\n-        else:\n-            #===========================\n-            #Neither forward nor reverse\n-            #===========================\n-            singles_writer.write(record)\n-            singles += 1\n-            neither += 1\n-            for old in buffered_reads:\n-                singles_writer.write(old)\n-                singles += 1\n-            buffered_reads = []\n-            last_template = None\n-if last_template:\n-    #Left over singles...\n-    for old in buffered_reads:\n-        singles_writer.write(old)\n-        singles += 1\n-in_handle.close\n-singles_writer.close()\n-if pairs_fastq:\n-    pairs_f_writer.close()\n-    assert pairs_r_writer.file.closed\n-else:\n-    pairs_f_writer.close()\n-    pairs_r_writer.close()\n-\n-if neither:\n-    print "%i reads (%i forward, %i reverse, %i neither), %i in pairs, %i as singles" \\\n-           % (count, forward, reverse, neither, pairs, singles)\n-else:\n-    print "%i reads (%i forward, %i reverse), %i in pairs, %i as singles" \\\n-           % (count, forward, reverse, pairs, singles)\n-\n-assert count == pairs + singles == forward + reverse + neither, \\\n-       "%i vs %i+%i=%i vs %i+%i+%i=%i" \\\n-       % (count,pairs,singles,pairs+singles,forward,reverse,neither,forward+reverse+neither)\n'
b
diff -r 6a14074bc810 -r d51819d2d7e2 tools/fastq/fastq_paired_unpaired.rst
--- a/tools/fastq/fastq_paired_unpaired.rst Mon Jul 29 09:28:55 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,109 +0,0 @@
-Galaxy tool to divide FASTQ files into paired and unpaired reads
-================================================================
-
-This tool is copyright 2010-2013 by Peter Cock, The James Hutton Institute
-(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
-See the licence text below.
-
-This tool is a short Python script which divides a FASTQ file into paired
-reads, and single or orphan reads. You can have separate files for the
-forward/reverse reads, or have them interleaved in a single file.
-
-Note that the FASTQ variant is unimportant (Sanger, Solexa, Illumina, or even
-Color Space should all work equally well).
-
-This tool is available from the Galaxy Tool Shed at:
-http://toolshed.g2.bx.psu.edu/view/peterjc/fastq_paired_unpaired
-
-
-Automated Installation
-======================
-
-This should be straightforward, Galaxy should automatically download and install
-the tool from the Galaxy Tool Shed, and run the unit tests
-
-
-Manual Installation
-===================
-
-There are just two files to install:
-
-* fastq_paired_unpaired.py (the Python script)
-* fastq_paired_unpaired.xml (the Galaxy tool definition)
-
-The suggested location is in the Galaxy folder tools/fastq next to other FASTQ
-tools provided with Galaxy.
-
-You will also need to modify the tools_conf.xml file to tell Galaxy to offer
-the tool. One suggested location is next to the fastq_filter.xml entry. Simply
-add the line::
-
-    <tool file="fastq/fastq_paired_unpaired.xml" />
-
-That's it.
-
-
-History
-=======
-
-======= ======================================================================
-Version Changes
-------- ----------------------------------------------------------------------
-v0.0.1  - Initial version, using Biopython
-v0.0.2  - Help text; cope with multiple pairs per template
-v0.0.3  - Galaxy XML wrappers added
-v0.0.4  - Use Galaxy library to handle FASTQ files (avoid Biopython dependency)
-v0.0.5  - Handle Illumina 1.8 style pair names
-v0.0.6  - Record script version when run from Galaxy
-        - Added unit test (FASTQ file using Sanger naming)
-v0.0.7  - Link to Tool Shed added to help text and this documentation.
-v0.0.8  - Use reStructuredText for this README file.
-        - Adopt standard MIT License.
-======= ======================================================================
-
-
-Developers
-==========
-
-This script and other tools for filtering FASTA, FASTQ and SFF files are
-currently being developed on the following hg branch:
-http://bitbucket.org/peterjc/galaxy-central/src/fasta_filter
-
-For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
-the following command from the Galaxy root folder::
-
-    $ tar -czf fastq_paired_unpaired.tar.gz tools/fastq/fastq_paired_unpaired.* test-data/sanger-pairs-*.fastq
-
-Check this worked::
-
-    $ tar -tzf fastq_paired_unpaired.tar.gz
-    tools/fastq/fastq_paired_unpaired.py
-    tools/fastq/fastq_paired_unpaired.rst
-    tools/fastq/fastq_paired_unpaired.xml
-    test-data/sanger-pairs-forward.fastq
-    test-data/sanger-pairs-interleaved.fastq
-    test-data/sanger-pairs-mixed.fastq
-    test-data/sanger-pairs-reverse.fastq
-    test-data/sanger-pairs-singles.fastq
-
-
-Licence (MIT)
-=============
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-THE SOFTWARE.
b
diff -r 6a14074bc810 -r d51819d2d7e2 tools/fastq/fastq_paired_unpaired.xml
--- a/tools/fastq/fastq_paired_unpaired.xml Mon Jul 29 09:28:55 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,105 +0,0 @@
-<tool id="fastq_paired_unpaired" name="Divide FASTQ file into paired and unpaired reads" version="0.0.7">
- <description>using the read name suffices</description>
- <version_command interpreter="python">fastq_paired_unpaired.py --version</version_command>
- <command interpreter="python">
-fastq_paired_unpaired.py $input_fastq.extension $input_fastq
-#if $output_choice_cond.output_choice=="separate"
- $output_forward $output_reverse
-#elif $output_choice_cond.output_choice=="interleaved"
- $output_paired
-#end if
-$output_singles
- </command>
- <stdio>
- <!-- Anything other than zero is an error -->
- <exit_code range="1:" />
- <exit_code range=":-1" />
- </stdio>
- <inputs>
- <param name="input_fastq" type="data" format="fastq" label="FASTQ file to divide into paired and unpaired reads"/>
- <conditional name="output_choice_cond">
- <param name="output_choice" type="select" label="How to output paired reads?">
- <option value="separate">Separate (two FASTQ files, for the forward and reverse reads, in matching order).</option>
- <option value="interleaved">Interleaved (one FASTQ file, alternating forward read then partner reverse read).</option>
- </param>
- <!-- Seems need these dummy entries here, compare this to indels/indel_sam2interval.xml -->
- <when value="separate" />
- <when value="interleaved" />
- </conditional>
- </inputs>
- <outputs>
- <data name="output_singles" format="input" label="Orphan or single reads"/>
- <data name="output_forward" format="input" label="Forward paired reads">
- <filter>output_choice_cond["output_choice"] == "separate"</filter>
- </data>
- <data name="output_reverse" format="input" label="Reverse paired reads">
- <filter>output_choice_cond["output_choice"] == "separate"</filter>
- </data>
- <data name="output_paired" format="input" label="Interleaved paired reads">
- <filter>output_choice_cond["output_choice"] == "interleaved"</filter>
- </data>
- </outputs>
- <tests>
- <test>
- <param name="input_fastq" value="sanger-pairs-mixed.fastq" ftype="fastq"/>
- <param name="output_choice" value="separate"/>
- <output name="output_singles" file="sanger-pairs-singles.fastq" ftype="fastq"/>
- <output name="output_forward" file="sanger-pairs-forward.fastq" ftype="fastq"/>
- <output name="output_reverse" file="sanger-pairs-reverse.fastq" ftype="fastq"/>
- </test>
- <test>
- <param name="input_fastq" value="sanger-pairs-mixed.fastq" ftype="fastq"/>
- <param name="output_choice" value="interleaved"/>
- <output name="output_singles" file="sanger-pairs-singles.fastq" ftype="fastq"/>
- <output name="output_paired" file="sanger-pairs-interleaved.fastq" ftype="fastq"/>
- </test>
- </tests>
- <help>
-
-**What it does**
-
-Using the common read name suffix conventions, it divides a FASTQ file into
-paired reads, and orphan or single reads.
-
-The input file should be a valid FASTQ file which has been sorted so that
-any partner forward+reverse reads are consecutive. The output files all
-preserve this sort order. Pairing are recognised based on standard name
-suffices. See below or run the tool with no arguments for more details.
-
-Any reads where the forward/reverse naming suffix used is not recognised
-are treated as orphan reads. The tool supports the /1 and /2 convention
-originally used by Illumina, .f and .r convention, the Sanger convention
-(see http://staden.sourceforge.net/manual/pregap4_unix_50.html for details),
-and the current Illumina convention where the reads get the same identifier
-with the fragment number in the description, for example:
-
- * @HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 1:N:0:TGNCCA
- * @HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 2:N:0:TGNCCA 
-
-Note that this does support multiple forward and reverse reads per template
-(which is quite common with Sanger sequencing), e.g. this which is sorted
-alphabetically:
-
- * WTSI_1055_4p17.p1kapIBF
- * WTSI_1055_4p17.p1kpIBF
- * WTSI_1055_4p17.q1kapIBR
- * WTSI_1055_4p17.q1kpIBR
-
-or this where the reads already come in pairs:
-
- * WTSI_1055_4p17.p1kapIBF
- * WTSI_1055_4p17.q1kapIBR
- * WTSI_1055_4p17.p1kpIBF
- * WTSI_1055_4p17.q1kpIBR
-
-both become:
-
- * WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR
- * WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR
-
-**Citation**
-
-This tool is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/fastq_paired_unpaired
- </help>
-</tool>
b
diff -r 6a14074bc810 -r d51819d2d7e2 tools/filters/get_orfs_or_cdss.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/filters/get_orfs_or_cdss.py Mon Jul 29 09:30:44 2013 -0400
[
@@ -0,0 +1,223 @@
+#!/usr/bin/env python
+"""Find ORFs in a nucleotide sequence file.
+
+get_orfs_or_cdss.py $input_fasta $input_format $table $ftype $ends $mode $min_len $strand $out_nuc_file $out_prot_file
+
+Takes ten command line options, input sequence filename, format, genetic
+code, CDS vs ORF, end type (open, closed), selection mode (all, top, one),
+minimum length (in amino acids), strand (both, forward, reverse), output
+nucleotide filename, and output protein filename.
+
+This tool is a short Python script which requires Biopython. If you use
+this tool in scientific work leading to a publication, please cite the
+Biopython application note:
+
+Cock et al 2009. Biopython: freely available Python tools for computational
+molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
+http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
+
+This script is copyright 2011-2013 by Peter Cock, The James Hutton Institute
+(formerly SCRI), Dundee, UK. All rights reserved.
+
+See accompanying text file for licence details (MIT/BSD style).
+
+This is version 0.0.3 of the script.
+"""
+import sys
+import re
+
+if "-v" in sys.argv or "--version" in sys.argv:
+    print "v0.0.3"
+    sys.exit(0)
+
+def stop_err(msg, err=1):
+    sys.stderr.write(msg.rstrip() + "\n")
+    sys.exit(err)
+
+try:
+    from Bio.Seq import Seq, reverse_complement, translate
+    from Bio.SeqRecord import SeqRecord
+    from Bio import SeqIO
+    from Bio.Data import CodonTable
+except ImportError:
+    stop_err("Missing Biopython library")
+
+#Parse Command Line
+try:
+    input_file, seq_format, table, ftype, ends, mode, min_len, strand, out_nuc_file, out_prot_file = sys.argv[1:]
+except ValueError:
+    stop_err("Expected ten arguments, got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv)))
+
+try:
+    table = int(table)
+except ValueError:
+    stop_err("Expected integer for genetic code table, got %s" % table)
+
+try:
+    table_obj = CodonTable.ambiguous_generic_by_id[table]
+except KeyError:
+    stop_err("Unknown codon table %i" % table)
+
+if ftype not in ["CDS", "ORF"]:
+    stop_err("Expected CDS or ORF, got %s" % ftype)
+
+if ends not in ["open", "closed"]:
+    stop_err("Expected open or closed for end treatment, got %s" % ends)
+
+try:
+    min_len = int(min_len)
+except ValueError:
+    stop_err("Expected integer for min_len, got %s" % min_len)
+
+if seq_format.lower()=="sff":
+    seq_format = "sff-trim"
+elif seq_format.lower()=="fasta":
+    seq_format = "fasta"
+elif seq_format.lower().startswith("fastq"):
+    seq_format = "fastq"
+else:
+    stop_err("Unsupported file type %r" % seq_format)
+
+print "Genetic code table %i" % table
+print "Minimum length %i aa" % min_len
+#print "Taking %s ORF(s) from %s strand(s)" % (mode, strand)
+
+starts = sorted(table_obj.start_codons)
+assert "NNN" not in starts
+re_starts = re.compile("|".join(starts))
+
+stops = sorted(table_obj.stop_codons)
+assert "NNN" not in stops
+re_stops = re.compile("|".join(stops))
+
+def start_chop_and_trans(s, strict=True):
+    """Returns offset, trimmed nuc, protein."""
+    if strict:
+        assert s[-3:] in stops, s
+    assert len(s) % 3 == 0
+    for match in re_starts.finditer(s):
+        #Must check the start is in frame
+        start = match.start()
+        if start % 3 == 0:
+            n = s[start:]
+            assert len(n) % 3 == 0, "%s is len %i" % (n, len(n))
+            if strict:
+                t = translate(n, table, cds=True)
+            else:
+                #Use when missing stop codon,
+                t = "M" + translate(n[3:], table, to_stop=True)
+            return start, n, t
+    return None, None, None
+
+def break_up_frame(s):
+    """Returns offset, nuc, protein."""
+    start = 0
+    for match in re_stops.finditer(s):
+        index = match.start() + 3
+        if index % 3 != 0:
+            continue
+        n = s[start:index]
+        if ftype=="CDS":
+            offset, n, t = start_chop_and_trans(n)
+        else:
+            offset = 0
+            t = translate(n, table, to_stop=True)
+        if n and len(t) >= min_len:
+            yield start + offset, n, t
+        start = index
+    if ends == "open":
+        #No stop codon, Biopython's strict CDS translate will fail
+        n = s[start:]
+        #Ensure we have whole codons
+        #TODO - Try appending N instead?
+        #TODO - Do the next four lines more elegantly
+        if len(n) % 3:
+            n = n[:-1]
+        if len(n) % 3:
+            n = n[:-1]
+        if ftype=="CDS":
+            offset, n, t = start_chop_and_trans(n, strict=False)
+        else:
+            offset = 0
+            t = translate(n, table, to_stop=True)
+        if n and len(t) >= min_len:
+            yield start + offset, n, t
+                        
+
+def get_all_peptides(nuc_seq):
+    """Returns start, end, strand, nucleotides, protein.
+
+    Co-ordinates are Python style zero-based.
+    """
+    #TODO - Refactor to use a generator function (in start order)
+    #rather than making a list and sorting?
+    answer = []
+    full_len = len(nuc_seq)
+    if strand != "reverse":
+        for frame in range(0,3):
+            for offset, n, t in break_up_frame(nuc_seq[frame:]):
+                start = frame + offset #zero based
+                answer.append((start, start + len(n), +1, n, t))
+    if strand != "forward":
+        rc = reverse_complement(nuc_seq)
+        for frame in range(0,3) :
+            for offset, n, t in break_up_frame(rc[frame:]):
+                start = full_len - frame - offset #zero based
+                answer.append((start - len(n), start, -1, n ,t))
+    answer.sort()
+    return answer
+
+def get_top_peptides(nuc_seq):
+    """Returns all peptides of max length."""
+    values = list(get_all_peptides(nuc_seq))
+    if not values:
+        raise StopIteration
+    max_len = max(len(x[-1]) for x in values)
+    for x in values:
+        if len(x[-1]) == max_len:
+            yield x
+
+def get_one_peptide(nuc_seq):
+    """Returns first (left most) peptide with max length."""
+    values = list(get_top_peptides(nuc_seq))
+    if not values:
+        raise StopIteration
+    yield values[0]
+
+if mode == "all":
+    get_peptides = get_all_peptides
+elif mode == "top":
+    get_peptides = get_top_peptides
+elif mode == "one":
+    get_peptides = get_one_peptide
+
+in_count = 0
+out_count = 0
+if out_nuc_file == "-":
+    out_nuc = sys.stdout
+else:
+    out_nuc = open(out_nuc_file, "w")
+if out_prot_file == "-":
+    out_prot = sys.stdout
+else:
+    out_prot = open(out_prot_file, "w")
+for record in SeqIO.parse(input_file, seq_format):
+    for i, (f_start, f_end, f_strand, n, t) in enumerate(get_peptides(str(record.seq).upper())):
+        out_count += 1
+        if f_strand == +1:
+            loc = "%i..%i" % (f_start+1, f_end)
+        else:
+            loc = "complement(%i..%i)" % (f_start+1, f_end)
+        descr = "length %i aa, %i bp, from %s of %s" \
+                % (len(t), len(n), loc, record.description)
+        r = SeqRecord(Seq(n), id = record.id + "|%s%i" % (ftype, i+1), name = "", description= descr)
+        t = SeqRecord(Seq(t), id = record.id + "|%s%i" % (ftype, i+1), name = "", description= descr)
+        SeqIO.write(r, out_nuc, "fasta")
+        SeqIO.write(t, out_prot, "fasta")
+    in_count += 1
+if out_nuc is not sys.stdout:
+    out_nuc.close()
+if out_prot is not sys.stdout:
+    out_prot.close()
+
+print "Found %i %ss in %i sequences" % (out_count, ftype, in_count)
b
diff -r 6a14074bc810 -r d51819d2d7e2 tools/filters/get_orfs_or_cdss.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/filters/get_orfs_or_cdss.rst Mon Jul 29 09:30:44 2013 -0400
b
@@ -0,0 +1,121 @@
+Galaxy tool to find ORFs or simple CDSs
+=======================================
+
+This tool is copyright 2011-2013 by Peter Cock, The James Hutton Institute
+(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
+See the licence text below.
+
+This tool is a short Python script (using Biopython library functions)
+to search nucleotide sequences for open reading frames (ORFs) or coding
+sequences (CDSs) where the first potential start codon is used. See the
+help text in the XML file for more information.
+
+This tool is available from the Galaxy Tool Shed at:
+
+* http://toolshed.g2.bx.psu.edu/view/peterjc/get_orfs_or_cdss
+
+See also the EMBOSS tool ``getorf`` which offers similar functionality and
+has also been wrapped for use within Galaxy.
+
+
+Automated Installation
+======================
+
+This should be straightforward using the Galaxy Tool Shed, which should be
+able to automatically install the dependency on Biopython, and then install
+this tool and run its unit tests.
+
+
+Manual Installation
+===================
+
+There are just two files to install to use this tool from within Galaxy:
+
+* get_orfs_or_cdss.py (the Python script)
+* get_orfs_or_cdss.xml (the Galaxy tool definition)
+
+If you are installing this manually (rather than via the Tool Shed), the
+suggested location is in the Galaxy folder tools/filters next to the tool
+for calling sff_extract.py for converting SFF to FASTQ or FASTA + QUAL.
+You will also need to modify the tools_conf.xml file to tell Galaxy to offer the
+tool. One suggested location is in the filters section. Simply add the line::
+
+    <tool file="filters/get_orfs_or_cdss.xml" />
+
+You will also need to install Biopython 1.54 or later. If you want to run
+the unit tests, include this line in tools_conf.xml.sample and the sample
+FASTA files under the test-data directory. Then::
+
+    ./run_functional_tests.sh -id get_orfs_or_cdss
+
+That's it.
+
+
+History
+=======
+
+======= ======================================================================
+Version Changes
+------- ----------------------------------------------------------------------
+v0.0.1   - Initial version.
+v0.0.2   - Correct labelling issue on reverse strand.
+         - Use the new <stdio> settings in the XML wrappers to catch errors
+v0.0.3   - Include unit tests.
+         - Record Python script version when run from Galaxy.
+v0.0.4   - Link to Tool Shed added to help text and this documentation.
+v0.0.5   - Automated intallation of the Biopython dependency.
+         - Use reStructuredText for this README file.
+         - Adopt standard MIT License.
+======= ======================================================================
+
+
+Developers
+==========
+
+This script and related tools are being developed on the following hg branch:
+http://bitbucket.org/peterjc/galaxy-central/src/tools
+
+For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
+the following command from the Galaxy root folder::
+
+    $ tar -czf get_orfs_or_cdss.tar.gz tools/filters/get_orfs_or_cdss.* tools/filters/repository_dependencies.xml test-data/get_orf_input*.fasta test-data/Ssuis.fasta
+
+Check this worked::
+
+    $ tar -tzf get_orfs_or_cdss.tar.gz
+    filter/get_orfs_or_cdss.py
+    filter/get_orfs_or_cdss.rst
+    filter/get_orfs_or_cdss.xml
+    tools/filters/repository_dependencies.xml
+    test-data/get_orf_input.fasta
+    test-data/get_orf_input.Suis_ORF.nuc.fasta
+    test-data/get_orf_input.Suis_ORF.prot.fasta
+    test-data/get_orf_input.t11_nuc_out.fasta
+    test-data/get_orf_input.t11_open_nuc_out.fasta
+    test-data/get_orf_input.t11_open_prot_out.fasta
+    test-data/get_orf_input.t11_prot_out.fasta
+    test-data/get_orf_input.t1_nuc_out.fasta
+    test-data/get_orf_input.t1_prot_out.fasta
+    test-data/Ssuis.fasta
+
+
+Licence (MIT)
+=============
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
b
diff -r 6a14074bc810 -r d51819d2d7e2 tools/filters/get_orfs_or_cdss.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/filters/get_orfs_or_cdss.xml Mon Jul 29 09:30:44 2013 -0400
b
b'@@ -0,0 +1,166 @@\n+<tool id="get_orfs_or_cdss" name="Get open reading frames (ORFs) or coding sequences (CDSs)" version="0.0.5">\n+\t<description>e.g. to get peptides from ESTs</description>\n+\t<version_command interpreter="python">get_orfs_or_cdss.py --version</version_command>\n+\t<command interpreter="python">\n+get_orfs_or_cdss.py $input_file $input_file.ext $table $ftype $ends $mode $min_len $strand $out_nuc_file $out_prot_file\n+\t</command>\n+\t<stdio>\n+\t\t<!-- Anything other than zero is an error -->\n+\t\t<exit_code range="1:" />\n+\t\t<exit_code range=":-1" />\n+\t</stdio>\n+\t<inputs>\n+\t\t<param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file (nucleotides)" help="FASTA, FASTQ, or SFF format." />\n+\t\t<param name="table" type="select" label="Genetic code" help="Tables from the NCBI, these determine the start and stop codons">\n+\t\t\t<option value="1">1. Standard</option>\n+\t\t\t<option value="2">2. Vertebrate Mitochondrial</option>\n+\t\t\t<option value="3">3. Yeast Mitochondrial</option>\n+\t\t\t<option value="4">4. Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma</option>\n+\t\t\t<option value="5">5. Invertebrate Mitochondrial</option>\n+\t\t\t<option value="6">6. Ciliate Macronuclear and Dasycladacean</option>\n+\t\t\t<option value="9">9. Echinoderm Mitochondrial</option>\n+\t\t\t<option value="10">10. Euplotid Nuclear</option>\n+\t\t\t<option value="11">11. Bacterial</option>\n+\t\t\t<option value="12">12. Alternative Yeast Nuclear</option>\n+\t\t\t<option value="13">13. Ascidian Mitochondrial</option>\n+\t\t\t<option value="14">14. Flatworm Mitochondrial</option>\n+\t\t\t<option value="15">15. Blepharisma Macronuclear</option>\n+\t\t\t<option value="16">16. Chlorophycean Mitochondrial</option>\n+\t\t\t<option value="21">21. Trematode Mitochondrial</option>\n+\t\t\t<option value="22">22. Scenedesmus obliquus</option>\n+\t\t\t<option value="23">23. Thraustochytrium Mitochondrial</option>\n+\t\t</param>\n+\t\t<param name="ftype" type="select" value="True" label="Look for ORFs or CDSs">\n+                        <option value="ORF">Look for ORFs (check for stop codons only, ignore start codons)</option>\n+                        <option value="CDS">Look for CDSs (with start and stop codons)</option>\n+\t\t</param>\n+                <param name="ends" type="select" value="open" label="Sequence end treatment">\n+\t\t\t<option value="open">Open ended (will allow missing start/stop codons at the ends)</option>\n+                        <option value="closed">Complete (will check for start/stop codons at the ends)</option>\n+                        <!-- TODO? Circular, for using this on finished bacteria etc -->\n+                </param>\n+\n+\t\t<param name="mode" type="select" label="Selection criteria" help="Suppose a sequence has ORFs/CDSs of lengths 100, 102 and 102 -- which should be taken? These options would return 3, 2 or 1 ORF.">\n+                    <option value="all">All ORFs/CDSs from each sequence</option>\n+                    <option value="top">All ORFs/CDSs from each sequence with the maximum length</option>\n+                    <option value="one">First ORF/CDS from each sequence with the maximum length</option>\n+\t\t</param>\n+                <param name="min_len" type="integer" size="5" value="30" label="Minimum length ORF/CDS (in amino acids, e.g. 30 aa = 90 bp plus any stop codon)">\n+                </param>\n+                <param name="strand" type="select" label="Strand to search" help="Use the forward only option if your sequence directionality is known (e.g. from poly-A tails, or strand specific RNA sequencing.">\n+                    <option value="both">Search both the forward and reverse strand</option>\n+                    <option value="forward">Only search the forward strand</option>\n+                    <option value="reverse">Only search the reverse strand</option>\n+                </param>\n+\t</inputs>\n+\t<outputs>\n+\t\t<data name="out_nuc_file" format="fasta" label="${ftype.value}s (nucleotides)" />\n+\t\t<data name="out_prot_file" format="fasta" label="'..b'input.t11_prot_out.fasta" />\n+\t\t</test>\n+\t\t<test>\n+                        <param name="input_file" value="get_orf_input.fasta" />\n+                        <param name="table" value="11" />\n+                        <param name="ftype" value="CDS" />\n+                        <param name="ends" value="open" />\n+                        <param name="mode" value="all" />\n+                        <param name="min_len" value="10" />\n+                        <param name="strand" value="forward" />\n+                        <output name="out_nuc_file" file="get_orf_input.t11_open_nuc_out.fasta" />\n+                        <output name="out_prot_file" file="get_orf_input.t11_open_prot_out.fasta" />\n+\t\t</test>\n+                <test>\n+\t\t\t<param name="input_file" value="Ssuis.fasta" />\n+\t\t\t<param name="table" value="11" />\n+\t\t\t<param name="ftype" value="ORF" />\n+\t\t\t<param name="ends" value="open" />\n+\t\t\t<param name="mode" value="all" />\n+\t\t\t<param name="min_len" value="100" />\n+\t\t\t<param name="strand" value="both" />\n+\t\t\t<output name="out_nuc_file" file="get_orf_input.Suis_ORF.nuc.fasta" />\n+\t\t\t<output name="out_prot_file" file="get_orf_input.Suis_ORF.prot.fasta" />\n+\t\t</test>\n+\t</tests>\n+\t<requirements>\n+\t\t<requirement type="python-module">Bio</requirement>\n+\t</requirements>\n+\t<help>\n+\n+**What it does**\n+\n+Takes an input file of nucleotide sequences (typically FASTA, but also FASTQ\n+and Standard Flowgram Format (SFF) are supported), and searches each sequence\n+for open reading frames (ORFs) or potential coding sequences (CDSs) of the\n+given minimum length. These are returned as FASTA files of nucleotides and\n+protein sequences.\n+\n+You can choose to have all the ORFs/CDSs above the minimum length for each\n+sequence (similar to the EMBOSS getorf tool), those with the longest length\n+equal, or the first ORF/CDS with the longest length (in the special case\n+where a sequence encodes two or more long ORFs/CDSs of the same length). The\n+last option is a reasonable choice when the input sequences represent EST or\n+mRNA sequences, where only one ORF/CDS is expected.\n+\n+Note that if no ORFs/CDSs in a sequence match the criteria, there will be no\n+output for that sequence.\n+\n+Also note that the ORFs/CDSs are assigned modified identifiers to distinguish\n+them from the original full length sequences, by appending a suffix.\n+\n+The start and stop codons are taken from the `NCBI Genetic Codes\n+&lt;http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi&gt;`_.\n+When searching for ORFs, the sequences will run from stop codon to stop\n+codon, and any start codons are ignored. When searching for CDSs, the first\n+potential start codon will be used, giving the longest possible CDS within\n+each ORF, and thus the longest possible protein sequence. This is useful\n+for things like BLAST or domain searching, but since this may not be the\n+correct start codon may not be appropriate for signal peptide detection\n+etc.\n+\n+**Example Usage**\n+\n+Given some EST sequences (Sanger capillary reads) assembled into unigenes,\n+or a transcriptome assembly from some RNA-Seq, each of your nucleotide\n+sequences should (barring sequencing, assembly errors, frame-shifts etc)\n+encode one protein as a single ORF/CDS, which you wish to extract (and\n+perhaps translate into amino acids).\n+\n+If your RNS-Seq data was strand specific, and assembled taking this into\n+account, you should only search for ORFs/CDSs on the forward strand.\n+\n+**Citation**\n+\n+This tool uses Biopython. If you use this tool in scientific work leading\n+to a publication, please cite the Biopython application note (and Galaxy\n+too of course):\n+\n+Cock et al 2009. Biopython: freely available Python tools for computational\n+molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.\n+http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.\n+\n+This tool is available to install into other Galaxy Instances via the Galaxy\n+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/get_orfs_or_cdss\n+\t</help>\n+</tool>\n'
b
diff -r 6a14074bc810 -r d51819d2d7e2 tools/filters/repository_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/filters/repository_dependencies.xml Mon Jul 29 09:30:44 2013 -0400
b
@@ -0,0 +1,6 @@
+<?xml version="1.0"?>
+<repositories description="This requires Biopython as a dependency.">
+<!-- Leave out the tool shed and revision to get the current
+     tool shed and latest revision at the time of upload -->
+<repository changeset_revision="5d0c54f7fea2" name="package_biopython_1_61" owner="biopython" toolshed="http://toolshed.g2.bx.psu.edu" />
+</repositories>