Previous changeset 2:324775a016ce (2013-04-23) Next changeset 4:d51819d2d7e2 (2013-07-29) |
Commit message:
Uploaded v0.0.8, automated Biopython dependency handling via ToolShed; MIT license; reST markup for README file. |
added:
test-data/sanger-pairs-forward.fastq test-data/sanger-pairs-interleaved.fastq test-data/sanger-pairs-mixed.fastq test-data/sanger-pairs-reverse.fastq test-data/sanger-pairs-singles.fastq tools/fastq/fastq_paired_unpaired.py tools/fastq/fastq_paired_unpaired.rst tools/fastq/fastq_paired_unpaired.xml |
removed:
test-data/Ssuis.fasta test-data/get_orf_input.Suis_ORF.nuc.fasta test-data/get_orf_input.Suis_ORF.prot.fasta test-data/get_orf_input.fasta test-data/get_orf_input.t11_nuc_out.fasta test-data/get_orf_input.t11_open_nuc_out.fasta test-data/get_orf_input.t11_open_prot_out.fasta test-data/get_orf_input.t11_prot_out.fasta test-data/get_orf_input.t1_nuc_out.fasta test-data/get_orf_input.t1_prot_out.fasta tools/filters/get_orfs_or_cdss.py tools/filters/get_orfs_or_cdss.txt tools/filters/get_orfs_or_cdss.xml |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/Ssuis.fasta --- a/test-data/Ssuis.fasta Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,33460 +0,0 @@\n->Streptococcus_suis\n-ATGAACCAAGAACAACTTTTTTGGCAACGATTTATTGAATTGGCAAAGGTAAATTTTAAG\n-CCATCTATTTATGATTTTTATGTCGCTGATGCAAAATTACTCGGAATCAACCAGCAAGTT\n-GCCAATATTTTCTTAAATCGTCCATTTAAAAAAGATTTCTGGGAAAAAAACTTCGAAGAG\n-TTAATGATTGCCGCTAGTTTTGAAAGCTACGGAGAGCCTCTTACCATCCAATATCAATTT\n-ACAGAGGATGAACAGGAGATTAGGAATACTACAAACACAAGAAGTTCAATAGTTCACCAG\n-GTACAGACACTTGAGCCGGCTACTCCTCAAGAAACTTTTAAACCGGTTCATTCTGATATA\n-AAATCCCAGTACACCTTTGCTAATTTTGTACAAGGAGACAATAATCACTGGGCAAAGGCT\n-GCAGCTTTAGCTGTATCTGATAACCTAGGTGAGCTCTACAATCCATTATTCATTTTTGGT\n-GGTCCTGGTCTTGGAAAAACTCATATTTTAAATGCGATTGGAAATAAGGTTCTAGCCGAT\n-AATCCCCAGGCAAGGATAAAATATGTCTCATCGGAAACATTCATCAATGAATTTTTAGAA\n-CACCTCCGTCTCAATGATATGGAAAGTTTCAAAAAAACCTATCGCAATCTGGACTTACTT\n-CTAATTGATGACATTCAGTCTCTCCGTAATAAAGCAACAACACAGGAAGAATTTTTCCAT\n-ACTTTTAATGCGCTTCATGAAAAAAATAAGCAGATTGTACTCACAAGCGACCGTAATCCC\n-GATCACTTAGACAATTTGGAAGAAAGACTAGTAACACGTTTCAAATGGGGGTTAACCAGT\n-GAAATCACTCCACCTGATTTTGAAACACGTATCGCAATTTTACGTAACAAGTGCGAGAAC\n-CTGCCTTACAACTTTACAAATGAGACGCTATCCTATCTAGCTGGGCAATTTGATTCGAAC\n-GTACGTGACCTTGAAGGTGCCTTAAAAGATATCCATTTGATAGCCACTATGCGTCAACTG\n-TCTGAGATAAGTGTCGAGGTTGCTGCTGAGGCTATTCGATCAAGAAAACAAACAAATCCA\n-CAAAACATGGTTATTCCTATTGAGAAAATCCAAACCGAAGTGGGAAATTTCTACGGTGTC\n-AGCTTGAAAGAATTAAAAGGTTCTAAGCGTGTTCAACATATCGTTCACGCGCGACAAGTT\n-GCTATGTTTTTAGCACGTGAAATGACAGACAATTCCCTTCCAAAAATTGGGAAAGAATTT\n-GGTAATCGAGACCATACAACCGTTATGCATGCATACAATAAAATAAAAACTCTCCTCTTG\n-GATGATGAGAATTTAGAAATAGAGATTACCAGTATAAAAAATAAACTTCGTTAACCTGTG\n-TATAACTTTTTTAAAAAACTCTGTTTTTTCCACAAGTTGTGAACAAGTTAATTTCCGCAG\n-TTTTATTGGTCTTTCATCACTTTTCCACAGAATACACAGAGACTACTATTACTATTAACC\n-TTATAGATAATAAATAAAGGAGAATCCATGATTCAATTTTCTATTAATAAAAATATATTT\n-CTACAAGCACTTAGTATTACTAAACGGGCAATCAGTACAAAAAATGCTATTCCAATTCTT\n-TCAACAGTAAAAATTACAGTAACTAGTGAAGGAATCACTTTAACTGGTTCAAATGGACAA\n-ATCTCGATAGAACATTTTATTTCTATTCAAGATGAAAATGCAGGGCTTTTGATCAGTTCT\n-CCAGGTTCCATTCTCTTAGAAGCTGGTTTCTTTATTAATGTCGTATCCAGTATGCCGGAT\n-TTGGTCCTTGACTTCAATGAAATTGAACAAAAGCAAATCGTTTTGACAAGTGGTAAGTCT\n-GAAATCACATTAAAGGGAAAAGAAGCAGAACAGTATCCTCGTTTACAGGAAGTTCCAACT\n-TCAAAACCATTGGTGTTAGAAACCAAAGTATTAAAACAAACAATTAATGAAACAGCATTT\n-GCAGCTTCTACACAAGAAAGTCGTCCTATTCTTACGGGTGTTCATTTTGTTTTAACAGAA\n-AATAAAAATCTAAAAACTGTTGCAACAGATTCACACCGTATGAGCCAACGGAAATTGGTC\n-CTTGATACCTCTGGTGATGATTTTAATGTTGTCATTCCAAGTCGTTCTCTCCGTGAATTT\n-ACTGCAGTTTTTACAGATGATATTGAAACAGTAGAAGTCTTCTTTTCAAATAATCAAATC\n-CTTTTTAGAAGCGAGCATATTAGCTTCTATACACGCTTATTAGAAGGTACCTACCCTGAT\n-ACCGACCGCTTAATTCCAACTGAGTTTAAAACAACTGCAATTTTTGATACTGCAAATCTT\n-CGTCACTCGATGGAGCGTGCTCGTCTTCTTTCAAATGCAACCCAAAATGGTACAGTAAAA\n-CTAGAAATTGCTAATAATGTTGTATCGGCTCATGTAAATTCTCCAGAAGTTGGACGTGTG\n-AATGAGGAATTAGATACTGTAGAAGTATCAGGTGAAGATTTAGTAATCAGCTTTAACCCA\n-ACTTACTTGATAGAAGCATTGAAAGCCACAACTAGTGAACAAGTGAAAATTAGCTTTATC\n-TCTTCTGTCCGTCCATTTACATTGATTCCAAATAATGAAGGGGAAGATTTTATTCAATTG\n-GTTACACCAGTTCGTACCAACTAAATAATATTAAGAACGGCTAAACTAGCCGTTTTTATG\n-TTATACTAAAAAATAGCACCTAGCTTATTTTTATATATTTAGTGATGGGGAATAAATGAC\n-GTTATATATATTAGCTAATCCTAATGCTGGTAGCCATACTGCTGAACATATCATATTCAA\n-AATAAAAGAAAGTTATCCACAGCTTGCAGTTAACATTTTTATGACAGTTGGTCCTGAGGA\n-TGAAAAAAGTCAAATAGAGGCTATTTTAAAGGAGTTTGTCAGTAGTGAAGATCAATTAAT\n-GATTTTAGGCGGAGACGGCACACTATCTAAAGCTTTGCGTTTTTGGCCAGCTAGTCTACC\n-GTTTGCTTATTATCCAACAGGATCTGGAAATGATTTTGCTAAGGCAATGAATATAACATC\n-GCTATATAGAAGTGTAGATGCCATTTTAGAGAGAAAAACAAGTCGGATATATGTTTTAAA\n-CAGTTCATACGGAACGGTTGTAAACAGTATGGATTTTGGCTTTGCAGCTCAAGTTATCAA\n-TGGTTCAACGAATTCAATTTTGAAAAAAATTCTGAACAAGGTAAAACTTGGGAAGTTAAC\n-TTATCTATTCTTTGGTATTAAAACATTATTTTCAAAACAAGCTATAAACTTAGAATTAAC\n-TCTTGATGAAAAATCTTATCAGTTAGATAATCTCTTTTTTATTTCTGTAGCAAATAGTCT\n-TTATTTTGGTGGAGGAATCATGATATGGCCAACAGCAAGTGCTAAAAAGAAGGAAGTAGA\n-TATTGTTTACTTCAAAAATGGAAATTTCTACCAACGTCTACAATCATTGTTAGCCTTATT\n-AACGAAGAGGCATGAATCTTCTCATACGATTCAGCATTTAACAGGGGTAGATGTAGTTTT\n-AAAATCAAAAGAAAAATTATTATTGCAAATAGATGGAGAGACATGCACTGCAAATGAGGT\n-AACGTTAACCTATCAGGAAAGAAGTATGTATCTTTAAGGAGGAAGTATGTACCAATTAGG\n-AACCTTTGTCGAAATGAAAAAGCCCCATGCCTGTGTCATCAAATCGACCGGAAAGAAGGC\n-TAATAAATGGGAGGTTATCCGTCTAGGAGCGGATATTAAAATCCGCTGTACCAACTGTGA\n-CCATGTCGTTATGATGAGCCGGCATGATTTTGAACGAAAAATGAAACAAGT'..b'CTCTACCAACTGAGCTA\n-TGGCGGAAGAAATAGTCCGTACGGGATTCGAACCCGTGTTACCGCCGTGAAAAGGCGGTG\n-TCTTAACCCCTTGACCAACGGACCATTTTTAGAACAATAACTAGTATAATACATGTGACT\n-TTGTTTGTCAATACATTTTTTGATTTTTTATTGTATTGACAGAGTGCTTTGTTTAATGTA\n-AAATAAAATGGTTAAGGTTCCATAGCTCAGCTGGATAGAGCATTCGCCTTCTAAGCGAAC\n-GGTCGCAGGTTCGAATCCTGCTGGAATCATTTAGACCTACCTCGAGTAGGTCTTTTTTCT\n-TGCCATAATTCATAATTAATATATAACACTGGCAAAATCAGACCAATAAGGGCATATTCT\n-TCAAATTGGAAGGATAGGTGAGTAGATATGATGACACCTAGCATAAACCCTATAATGGTC\n-AATAAGATGTTTCTACCTGTTTTTCTAAGTTCTGAATCTTTTTCAATAACTCCTTTAAAC\n-CAGAGATAAGCAGCATTTTTGACATTCCCTGTCATCATCACATTGGCATACGGAGCACCT\n-CGTAACCTTCTAAATGTTTCTACTTGAATAGAGGCTACGAAGGCTAGACTAGCAATTGTA\n-AAAGACGCAGGCATTATAGGTGAGAGAATGATAGTTAGTAAAATAAGAACTAACATCATT\n-ACACTACTACCAAAGTGCCAAGACCATGTTTGTTTTTCAAAATACCTTCTTGCTAAGTAG\n-GTAAAAAATTGTCCGAATACAAAAAATAAAATGGGAATGGAAAAATTAACTACCTGCGCA\n-AAATCACCTTTAGCTAAAAAATAAGCTAGGGAAATAACATTTCCAGATTGTACGCCAGCA\n-AAGCGACCACCCTGAGTCACAAAAGTAAAGGCATTTAAATAACCACTGATAAACGTTAAT\n-GAACAAGCAATTCTCAATCCCTCAAAAACACGATACTCTTTTTGATTCATTTTCACTCCT\n-TGTTTCACGTGAAACTACTTATGATATGGGCTTCCCTGCTGAATCATAAATGCACGATAA\n-ATCTGCTCGATGAGAACTAATTTCATTAGTTGATGGGGAAGTGTCAACTGTCCAAAACTC\n-ATCAACAAATTAGCTCTTTTTTTAATACAAGAATCGAGACCCAAACTACCACCGATGATA\n-AAAGTTATATCTGAATACCCATTTACTGCAATGTCAGATATCCTTTGACTAAATTCTTCC\n-GATGGAAATTGTTTCCCTTCTATCGCTAAGGCAATGACAAAATCTCGCTCTCCAATTTTA\n-GACATAATTCTATCGGCTTCTTTTTTTAATATTTGTTCATTCTCTGCCTGACTGGCTTTA\n-TCTGGTGTTTTTTCATCAGGAAGCTCAATCATATCCAACTTAGTAAATCGTCCCAATCGT\n-TTACTATATTCTGCAATACCTTCTTTGAGGTACTTTTCTTTCAATTTTCCAACGGTAATC\n-AATTTTATTTTCATAAAATAATTGTAACATATCCACAAGCATACGACAGAAAATATTTTT\n-AGAAAATCAGGATATGGCTACAGTTTTTCACATAATTCACAGAGTTATCCACAGGTTGTG\n-GATTGATTTTTGAAAACTTTAAGTTATAATTAAGAAAGAAATAGTACTCTTAAGGAAAAT\n-TAAAGAAATGGAAAGGATTCCTTATATGAAAAAATATTTGAAATTTGCGATTTTATTTGT\n-AATTGGATTTTTTGGGGGTCTTATCGGGGCCTTGTCAGCCTCTTTCTTCCAGCCACAGGT\n-GCAACAAGCAAATTCTGCTATCACTAGTGTCAGCAATGTTCAATATAATAATGAAACTTC\n-CACCACAAAAGCTGTAGAGAAAGTACAAAATGCTGTTGTGTCTGTTATTAATTACCAAAA\n-ATCAGCCAACAATAGTCTTGGTGTTATCTTTGGAAATATTGAATCATCTGACGAACTAGC\n-TGTTGCTGGAGAGGGGTCTGGGGTTATCTATAAAAAATATGGTCAATATGCCTATATTGT\n-GACAAATACGCATGTTATTAATAACGCAGAAAAGATTGATATCCTTTTAGCATCTGGAGA\n-AAAAATTAGCGGTGAACTTGTTGGTTCCGATACATATTCTGATATAGCTGTTATAAAAAT\n-ATCAGCAGATAAAGTCACTGCTGTTGCTGAATTTGCTGATTCCGATACAATTAAAGTTGG\n-AGAAACTGCTATCGCAATTGGTAGTCCTCTAGGTAGCGTCTACGCCAATACAGTTACCCA\n-GGGTATTATTTCTAGCTTAAGTCGGACAGTTACTTCACAATCAAAAGATGGACAAACAAT\n-CTCAACTAACGCTATTCAAACTGATACAGCTATCAACCCTGGAAACTCTGGCGGACCGTT\n-AATCAATACCCAAGGACAAGTGATAGGCATTACCTCTAGCAAAATTACCTCAAGTTCTGC\n-AAATAGCTCAGGCGTGGCTGTAGAAGGGTTGGGATTTGCTATTCCTGCAAATGATGCCGT\n-AGCTATTATCAATCAGCTTGAAAAAACTGGACAAGTTAGCCGACCTGCTCTTGGAGTTCA\n-TATGGTTAACTTGACGACCTTGTCAACTAGTCAATTAGAAAAAGCTGGATTATCAAATAC\n-GGAATTAACATCCGGTGTAGTAATTGTCTCTACACAAAGTGGGCTACCTGCAGATGGAAA\n-ATTAGAAACTTTTGATGTTATTACTGAGATTGACGGAGAAGCTATTCAAAATAAGAGTGA\n-CCTCCAGAGCGCTCTCTACAAACATCAAATTGGAGATACAATCACTGTAACTTATTACCG\n-CAATAATCAGAAACAAACTGTTGACATTAAGTTGACACATTCTACAGAAGAACTTAGCGA\n-ATAATTGACAAATGAGACTTTACACAATTGTAAAGTCTCATTTTTTTTGCTAGAATAAGG\n-ATATATGGAAGAATTACGTACACTAAATATTTCAGAAATCCATCCCAATCCCTATCAGCC\n-AAGAATTCATTTTGATGAAAAGGAGCTACTTGAGCTCGCTCAATCTATTAAGGAAAATGG\n-CTTAATTCAACCGATTATTGTAAGAAAATCTTCTATTATCGGATACGAATTATTAGCTGG\n-AGAAAGAAGGTTGCGAGCCAGTCAATTAGCTGGACTGACTACAATACCAGCAGTGGTAAA\n-AGAACTGACTGATGATGATTTACTCTATCAGGCTATCATAGAGAATCTGCAGCGTTCTAA\n-CTTAAATCCGATAGAAGAAGCAGCCTCTTATCAAAAATTGATTAGTAGAGGGTTAACACA\n-TGATGAAGTTGCTCAAATCATGGGAAAATCAAGACCATATATCAGTAATTTATTGCGCCT\n-ACTAAATCTATCATCTCAGACTAAACAAGCTGTAGAAGAAGGAAAAATTTCACAAGGGCA\n-CGCGCGACAATTGGTGTCATTTTCAGAAGAAAAGCAAGCCGAATGGGTTCAACTCATTTT\n-ATCAAAGGATTTAAGTGTGCGTACGCTTGAAAAATTAATAGCTGCAAATAAGAAAAAACA\n-CACTAAGCTTAAACAACGCGACCAATTTTTAAAAGAACAGGAAGATTCACTCAGTAAAAC\n-TCTTGGAACAGCTACAAAAATTATCAAGAAGAAAAACGGGAGCGGAGAAATTCGGATTAG\n-CTTTAATGACCTCGATGAATTCGAAAGAATTATCAACAATTTTAAATAGACTTGTTTACA\n-ATTTATTTTTATAAACACTCTTTTCCACACTAAAATCATTACAAAAAGTCAGGACCAGCA\n-AGGGTTCTGACTTTTATTCACATCTTGTGGAAAACTTTTCTTAACAGTGTGGATTTTAAA\n-AATTATCTGTGGAAAACTTTTGTTTTTTATGGTACACTATTCTAACGAATATAATGTGAA\n-AGGGGGAAAAT\n' |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.Suis_ORF.nuc.fasta --- a/test-data/get_orf_input.Suis_ORF.nuc.fasta Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,41831 +0,0 @@\n->Streptococcus_suis|ORF1 length 457 aa, 1374 bp, from 1..1374 of Streptococcus_suis\n-ATGAACCAAGAACAACTTTTTTGGCAACGATTTATTGAATTGGCAAAGGTAAATTTTAAG\n-CCATCTATTTATGATTTTTATGTCGCTGATGCAAAATTACTCGGAATCAACCAGCAAGTT\n-GCCAATATTTTCTTAAATCGTCCATTTAAAAAAGATTTCTGGGAAAAAAACTTCGAAGAG\n-TTAATGATTGCCGCTAGTTTTGAAAGCTACGGAGAGCCTCTTACCATCCAATATCAATTT\n-ACAGAGGATGAACAGGAGATTAGGAATACTACAAACACAAGAAGTTCAATAGTTCACCAG\n-GTACAGACACTTGAGCCGGCTACTCCTCAAGAAACTTTTAAACCGGTTCATTCTGATATA\n-AAATCCCAGTACACCTTTGCTAATTTTGTACAAGGAGACAATAATCACTGGGCAAAGGCT\n-GCAGCTTTAGCTGTATCTGATAACCTAGGTGAGCTCTACAATCCATTATTCATTTTTGGT\n-GGTCCTGGTCTTGGAAAAACTCATATTTTAAATGCGATTGGAAATAAGGTTCTAGCCGAT\n-AATCCCCAGGCAAGGATAAAATATGTCTCATCGGAAACATTCATCAATGAATTTTTAGAA\n-CACCTCCGTCTCAATGATATGGAAAGTTTCAAAAAAACCTATCGCAATCTGGACTTACTT\n-CTAATTGATGACATTCAGTCTCTCCGTAATAAAGCAACAACACAGGAAGAATTTTTCCAT\n-ACTTTTAATGCGCTTCATGAAAAAAATAAGCAGATTGTACTCACAAGCGACCGTAATCCC\n-GATCACTTAGACAATTTGGAAGAAAGACTAGTAACACGTTTCAAATGGGGGTTAACCAGT\n-GAAATCACTCCACCTGATTTTGAAACACGTATCGCAATTTTACGTAACAAGTGCGAGAAC\n-CTGCCTTACAACTTTACAAATGAGACGCTATCCTATCTAGCTGGGCAATTTGATTCGAAC\n-GTACGTGACCTTGAAGGTGCCTTAAAAGATATCCATTTGATAGCCACTATGCGTCAACTG\n-TCTGAGATAAGTGTCGAGGTTGCTGCTGAGGCTATTCGATCAAGAAAACAAACAAATCCA\n-CAAAACATGGTTATTCCTATTGAGAAAATCCAAACCGAAGTGGGAAATTTCTACGGTGTC\n-AGCTTGAAAGAATTAAAAGGTTCTAAGCGTGTTCAACATATCGTTCACGCGCGACAAGTT\n-GCTATGTTTTTAGCACGTGAAATGACAGACAATTCCCTTCCAAAAATTGGGAAAGAATTT\n-GGTAATCGAGACCATACAACCGTTATGCATGCATACAATAAAATAAAAACTCTCCTCTTG\n-GATGATGAGAATTTAGAAATAGAGATTACCAGTATAAAAAATAAACTTCGTTAA\n->Streptococcus_suis|ORF2 length 385 aa, 1158 bp, from 1507..2664 of Streptococcus_suis\n-ATAATAAATAAAGGAGAATCCATGATTCAATTTTCTATTAATAAAAATATATTTCTACAA\n-GCACTTAGTATTACTAAACGGGCAATCAGTACAAAAAATGCTATTCCAATTCTTTCAACA\n-GTAAAAATTACAGTAACTAGTGAAGGAATCACTTTAACTGGTTCAAATGGACAAATCTCG\n-ATAGAACATTTTATTTCTATTCAAGATGAAAATGCAGGGCTTTTGATCAGTTCTCCAGGT\n-TCCATTCTCTTAGAAGCTGGTTTCTTTATTAATGTCGTATCCAGTATGCCGGATTTGGTC\n-CTTGACTTCAATGAAATTGAACAAAAGCAAATCGTTTTGACAAGTGGTAAGTCTGAAATC\n-ACATTAAAGGGAAAAGAAGCAGAACAGTATCCTCGTTTACAGGAAGTTCCAACTTCAAAA\n-CCATTGGTGTTAGAAACCAAAGTATTAAAACAAACAATTAATGAAACAGCATTTGCAGCT\n-TCTACACAAGAAAGTCGTCCTATTCTTACGGGTGTTCATTTTGTTTTAACAGAAAATAAA\n-AATCTAAAAACTGTTGCAACAGATTCACACCGTATGAGCCAACGGAAATTGGTCCTTGAT\n-ACCTCTGGTGATGATTTTAATGTTGTCATTCCAAGTCGTTCTCTCCGTGAATTTACTGCA\n-GTTTTTACAGATGATATTGAAACAGTAGAAGTCTTCTTTTCAAATAATCAAATCCTTTTT\n-AGAAGCGAGCATATTAGCTTCTATACACGCTTATTAGAAGGTACCTACCCTGATACCGAC\n-CGCTTAATTCCAACTGAGTTTAAAACAACTGCAATTTTTGATACTGCAAATCTTCGTCAC\n-TCGATGGAGCGTGCTCGTCTTCTTTCAAATGCAACCCAAAATGGTACAGTAAAACTAGAA\n-ATTGCTAATAATGTTGTATCGGCTCATGTAAATTCTCCAGAAGTTGGACGTGTGAATGAG\n-GAATTAGATACTGTAGAAGTATCAGGTGAAGATTTAGTAATCAGCTTTAACCCAACTTAC\n-TTGATAGAAGCATTGAAAGCCACAACTAGTGAACAAGTGAAAATTAGCTTTATCTCTTCT\n-GTCCGTCCATTTACATTGATTCCAAATAATGAAGGGGAAGATTTTATTCAATTGGTTACA\n-CCAGTTCGTACCAACTAA\n->Streptococcus_suis|ORF3 length 104 aa, 315 bp, from complement(1707..2021) of Streptococcus_suis\n-ACACCCGTAAGAATAGGACGACTTTCTTGTGTAGAAGCTGCAAATGCTGTTTCATTAATT\n-GTTTGTTTTAATACTTTGGTTTCTAACACCAATGGTTTTGAAGTTGGAACTTCCTGTAAA\n-CGAGGATACTGTTCTGCTTCTTTTCCCTTTAATGTGATTTCAGACTTACCACTTGTCAAA\n-ACGATTTGCTTTTGTTCAATTTCATTGAAGTCAAGGACCAAATCCGGCATACTGGATACG\n-ACATTAATAAAGAAACCAGCTTCTAAGAGAATGGAACCTGGAGAACTGATCAAAAGCCCT\n-GCATTTTCATCTTGA\n->Streptococcus_suis|ORF4 length 293 aa, 882 bp, from 2756..3637 of Streptococcus_suis\n-ATGACGTTATATATATTAGCTAATCCTAATGCTGGTAGCCATACTGCTGAACATATCATA\n-TTCAAAATAAAAGAAAGTTATCCACAGCTTGCAGTTAACATTTTTATGACAGTTGGTCCT\n-GAGGATGAAAAAAGTCAAATAGAGGCTATTTTAAAGGAGTTTGTCAGTAGTGAAGATCAA\n-TTAATGATTTTAGGCGGAGACGGCACACTATCTAAAGCTTTGCGTTTTTGGCCAGCTAGT\n-CTACCGTTTGCTTATTATCCAACAGGATCTGGAAATGATTTTGCTAAGGCAATGAATATA\n-ACATCGCTATATAGAAGTGTAGATGCCATTTTAGAGAGAAAAACAAGTCGGATATATGTT\n-TTAAACAGTTCATACGGAACGGTTGTAAACAGTATGGATTTTGGCTTTGCAGCTCAAGTT\n-ATCAATGGTTCAACGAATTCAATTTTGAAAAAAATTCTGAACAAGGTAAAACTTGGGAAG\n-TTAACTTATCTATTCTTTGGTATTAAAACATTATTTTCAAAACAAGCTATAAACTTAGAA\n-TTAACTCTTGATGAAAAATCTTATCAGTTAGATAATCTCTTTTTTATTTCTGTAGCAAAT\n-AGTCTTTATTTTGGTGGAGGAATCATGATATGGCCAACAGCAAGTGCTAAAAAG'..b'GCAACCATTGATGGTAAACCTATCAAAATCCAAAAAGCGCAAGATGGT\n-TTTATGAAAGTGGATGTAAGTCCAGGTCAAACTAAACTAGTTTTAACCTTTGTACCAAAT\n-GGTTTCTATCTAGGTTTACTGATTTCTTTTGGTGCAGTTTTTGTATTTTTCTCCTATCAA\n-TTCATTGGATACTATTATTCTAAGAACCGAGAATACTAA\n->Streptococcus_suis|ORF2907 length 235 aa, 708 bp, from complement(2003907..2004614) of Streptococcus_suis\n-TTTCACGTGAAACAAGGAGTGAAAATGAATCAAAAAGAGTATCGTGTTTTTGAGGGATTG\n-AGAATTGCTTGTTCATTAACGTTTATCAGTGGTTATTTAAATGCCTTTACTTTTGTGACT\n-CAGGGTGGTCGCTTTGCTGGCGTACAATCTGGAAATGTTATTTCCCTAGCTTATTTTTTA\n-GCTAAAGGTGATTTTGCGCAGGTAGTTAATTTTTCCATTCCCATTTTATTTTTTGTATTC\n-GGACAATTTTTTACCTACTTAGCAAGAAGGTATTTTGAAAAACAAACATGGTCTTGGCAC\n-TTTGGTAGTAGTGTAATGATGTTAGTTCTTATTTTACTAACTATCATTCTCTCACCTATA\n-ATGCCTGCGTCTTTTACAATTGCTAGTCTAGCCTTCGTAGCCTCTATTCAAGTAGAAACA\n-TTTAGAAGGTTACGAGGTGCTCCGTATGCCAATGTGATGATGACAGGGAATGTCAAAAAT\n-GCTGCTTATCTCTGGTTTAAAGGAGTTATTGAAAAAGATTCAGAACTTAGAAAAACAGGT\n-AGAAACATCTTATTGACCATTATAGGGTTTATGCTAGGTGTCATCATATCTACTCACCTA\n-TCCTTCCAATTTGAAGAATATGCCCTTATTGGTCTGATTTTGCCAGTGTTATATATTAAT\n-TATGAATTATGGCAAGAAAAAAGACCTACTCGAGGTAGGTCTAAATGA\n->Streptococcus_suis|ORF2908 length 180 aa, 543 bp, from complement(2004615..2005157) of Streptococcus_suis\n-CCATATCCTGATTTTCTAAAAATATTTTCTGTCGTATGCTTGTGGATATGTTACAATTAT\n-TTTATGAAAATAAAATTGATTACCGTTGGAAAATTGAAAGAAAAGTACCTCAAAGAAGGT\n-ATTGCAGAATATAGTAAACGATTGGGACGATTTACTAAGTTGGATATGATTGAGCTTCCT\n-GATGAAAAAACACCAGATAAAGCCAGTCAGGCAGAGAATGAACAAATATTAAAAAAAGAA\n-GCCGATAGAATTATGTCTAAAATTGGAGAGCGAGATTTTGTCATTGCCTTAGCGATAGAA\n-GGGAAACAATTTCCATCGGAAGAATTTAGTCAAAGGATATCTGACATTGCAGTAAATGGG\n-TATTCAGATATAACTTTTATCATCGGTGGTAGTTTGGGTCTCGATTCTTGTATTAAAAAA\n-AGAGCTAATTTGTTGATGAGTTTTGGACAGTTGACACTTCCCCATCAACTAATGAAATTA\n-GTTCTCATCGAGCAGATTTATCGTGCATTTATGATTCAGCAGGGAAGCCCATATCATAAG\n-TAG\n->Streptococcus_suis|ORF2909 length 413 aa, 1242 bp, from 2005223..2006464 of Streptococcus_suis\n-GTTATAATTAAGAAAGAAATAGTACTCTTAAGGAAAATTAAAGAAATGGAAAGGATTCCT\n-TATATGAAAAAATATTTGAAATTTGCGATTTTATTTGTAATTGGATTTTTTGGGGGTCTT\n-ATCGGGGCCTTGTCAGCCTCTTTCTTCCAGCCACAGGTGCAACAAGCAAATTCTGCTATC\n-ACTAGTGTCAGCAATGTTCAATATAATAATGAAACTTCCACCACAAAAGCTGTAGAGAAA\n-GTACAAAATGCTGTTGTGTCTGTTATTAATTACCAAAAATCAGCCAACAATAGTCTTGGT\n-GTTATCTTTGGAAATATTGAATCATCTGACGAACTAGCTGTTGCTGGAGAGGGGTCTGGG\n-GTTATCTATAAAAAATATGGTCAATATGCCTATATTGTGACAAATACGCATGTTATTAAT\n-AACGCAGAAAAGATTGATATCCTTTTAGCATCTGGAGAAAAAATTAGCGGTGAACTTGTT\n-GGTTCCGATACATATTCTGATATAGCTGTTATAAAAATATCAGCAGATAAAGTCACTGCT\n-GTTGCTGAATTTGCTGATTCCGATACAATTAAAGTTGGAGAAACTGCTATCGCAATTGGT\n-AGTCCTCTAGGTAGCGTCTACGCCAATACAGTTACCCAGGGTATTATTTCTAGCTTAAGT\n-CGGACAGTTACTTCACAATCAAAAGATGGACAAACAATCTCAACTAACGCTATTCAAACT\n-GATACAGCTATCAACCCTGGAAACTCTGGCGGACCGTTAATCAATACCCAAGGACAAGTG\n-ATAGGCATTACCTCTAGCAAAATTACCTCAAGTTCTGCAAATAGCTCAGGCGTGGCTGTA\n-GAAGGGTTGGGATTTGCTATTCCTGCAAATGATGCCGTAGCTATTATCAATCAGCTTGAA\n-AAAACTGGACAAGTTAGCCGACCTGCTCTTGGAGTTCATATGGTTAACTTGACGACCTTG\n-TCAACTAGTCAATTAGAAAAAGCTGGATTATCAAATACGGAATTAACATCCGGTGTAGTA\n-ATTGTCTCTACACAAAGTGGGCTACCTGCAGATGGAAAATTAGAAACTTTTGATGTTATT\n-ACTGAGATTGACGGAGAAGCTATTCAAAATAAGAGTGACCTCCAGAGCGCTCTCTACAAA\n-CATCAAATTGGAGATACAATCACTGTAACTTATTACCGCAATAATCAGAAACAAACTGTT\n-GACATTAAGTTGACACATTCTACAGAAGAACTTAGCGAATAA\n->Streptococcus_suis|ORF2910 length 256 aa, 771 bp, from 2006519..2007289 of Streptococcus_suis\n-GGATATATGGAAGAATTACGTACACTAAATATTTCAGAAATCCATCCCAATCCCTATCAG\n-CCAAGAATTCATTTTGATGAAAAGGAGCTACTTGAGCTCGCTCAATCTATTAAGGAAAAT\n-GGCTTAATTCAACCGATTATTGTAAGAAAATCTTCTATTATCGGATACGAATTATTAGCT\n-GGAGAAAGAAGGTTGCGAGCCAGTCAATTAGCTGGACTGACTACAATACCAGCAGTGGTA\n-AAAGAACTGACTGATGATGATTTACTCTATCAGGCTATCATAGAGAATCTGCAGCGTTCT\n-AACTTAAATCCGATAGAAGAAGCAGCCTCTTATCAAAAATTGATTAGTAGAGGGTTAACA\n-CATGATGAAGTTGCTCAAATCATGGGAAAATCAAGACCATATATCAGTAATTTATTGCGC\n-CTACTAAATCTATCATCTCAGACTAAACAAGCTGTAGAAGAAGGAAAAATTTCACAAGGG\n-CACGCGCGACAATTGGTGTCATTTTCAGAAGAAAAGCAAGCCGAATGGGTTCAACTCATT\n-TTATCAAAGGATTTAAGTGTGCGTACGCTTGAAAAATTAATAGCTGCAAATAAGAAAAAA\n-CACACTAAGCTTAAACAACGCGACCAATTTTTAAAAGAACAGGAAGATTCACTCAGTAAA\n-ACTCTTGGAACAGCTACAAAAATTATCAAGAAGAAAAACGGGAGCGGAGAAATTCGGATT\n-AGCTTTAATGACCTCGATGAATTCGAAAGAATTATCAACAATTTTAAATAG\n' |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.Suis_ORF.prot.fasta --- a/test-data/get_orf_input.Suis_ORF.prot.fasta Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,16670 +0,0 @@\n->Streptococcus_suis|ORF1 length 457 aa, 1374 bp, from 1..1374 of Streptococcus_suis\n-MNQEQLFWQRFIELAKVNFKPSIYDFYVADAKLLGINQQVANIFLNRPFKKDFWEKNFEE\n-LMIAASFESYGEPLTIQYQFTEDEQEIRNTTNTRSSIVHQVQTLEPATPQETFKPVHSDI\n-KSQYTFANFVQGDNNHWAKAAALAVSDNLGELYNPLFIFGGPGLGKTHILNAIGNKVLAD\n-NPQARIKYVSSETFINEFLEHLRLNDMESFKKTYRNLDLLLIDDIQSLRNKATTQEEFFH\n-TFNALHEKNKQIVLTSDRNPDHLDNLEERLVTRFKWGLTSEITPPDFETRIAILRNKCEN\n-LPYNFTNETLSYLAGQFDSNVRDLEGALKDIHLIATMRQLSEISVEVAAEAIRSRKQTNP\n-QNMVIPIEKIQTEVGNFYGVSLKELKGSKRVQHIVHARQVAMFLAREMTDNSLPKIGKEF\n-GNRDHTTVMHAYNKIKTLLLDDENLEIEITSIKNKLR\n->Streptococcus_suis|ORF2 length 385 aa, 1158 bp, from 1507..2664 of Streptococcus_suis\n-IINKGESMIQFSINKNIFLQALSITKRAISTKNAIPILSTVKITVTSEGITLTGSNGQIS\n-IEHFISIQDENAGLLISSPGSILLEAGFFINVVSSMPDLVLDFNEIEQKQIVLTSGKSEI\n-TLKGKEAEQYPRLQEVPTSKPLVLETKVLKQTINETAFAASTQESRPILTGVHFVLTENK\n-NLKTVATDSHRMSQRKLVLDTSGDDFNVVIPSRSLREFTAVFTDDIETVEVFFSNNQILF\n-RSEHISFYTRLLEGTYPDTDRLIPTEFKTTAIFDTANLRHSMERARLLSNATQNGTVKLE\n-IANNVVSAHVNSPEVGRVNEELDTVEVSGEDLVISFNPTYLIEALKATTSEQVKISFISS\n-VRPFTLIPNNEGEDFIQLVTPVRTN\n->Streptococcus_suis|ORF3 length 104 aa, 315 bp, from complement(1707..2021) of Streptococcus_suis\n-TPVRIGRLSCVEAANAVSLIVCFNTLVSNTNGFEVGTSCKRGYCSASFPFNVISDLPLVK\n-TICFCSISLKSRTKSGILDTTLIKKPASKRMEPGELIKSPAFSS\n->Streptococcus_suis|ORF4 length 293 aa, 882 bp, from 2756..3637 of Streptococcus_suis\n-MTLYILANPNAGSHTAEHIIFKIKESYPQLAVNIFMTVGPEDEKSQIEAILKEFVSSEDQ\n-LMILGGDGTLSKALRFWPASLPFAYYPTGSGNDFAKAMNITSLYRSVDAILERKTSRIYV\n-LNSSYGTVVNSMDFGFAAQVINGSTNSILKKILNKVKLGKLTYLFFGIKTLFSKQAINLE\n-LTLDEKSYQLDNLFFISVANSLYFGGGIMIWPTASAKKKEVDIVYFKNGNFYQRLQSLLA\n-LLTKRHESSHTIQHLTGVDVVLKSKEKLLLQIDGETCTANEVTLTYQERSMYL\n->Streptococcus_suis|ORF5 length 126 aa, 381 bp, from 3933..4313 of Streptococcus_suis\n-KKEEEMIMKQLAQQIRVLRTAKNLSQDELAEKLYISRQAVSKWENGEATPDIDKLVQLAE\n-IFGVSLDYLVLGKEPEKEIVVEQRGKMNGWEFLNEESKRPLTRGDVVLLIFLAVMLLGGL\n-FIKHYF\n->Streptococcus_suis|ORF6 length 377 aa, 1134 bp, from 4381..5514 of Streptococcus_suis\n-LESKKNMSLTAGIVGLPNVGKSTLFNAITKAGAEAANYPFATIDPNVGMVEVPDERLQKL\n-TELIIPKKTVPTTFEFTDIAGIVKGASKGEGLGNKFLANIREVDAIVHVVRAFDDENVMR\n-EQGREDAFVDPIADIDTINLELILADLESINKRYARVEKMARTQKDKDSVAEFAVLEKIK\n-PVLEDGKSARTVEFTDEEQKIVKQLFLLTTKPVLYVANVDEDKVADPEAISYVQQIRDFA\n-ATENAEVVVISARAEEEISELDDEDKGEFLEALGLTESGVDKLTRAAYHLLGLGTYFTAG\n-EKEVRAWTFKRGMKAPQCAGIIHSDFEKGFIRAVTMSYDDLMTYGSEKAVKEAGRLREEG\n-KEYVVQDGDIMEFRFNV\n->Streptococcus_suis|ORF7 length 115 aa, 348 bp, from complement(4450..4797) of Streptococcus_suis\n-VNGINISDWIHKGIFTALFTHDIFIVKGTHNVDNRINFADIGQEFISKSFTFRSTFYDTS\n-NISKFKSRWHCLFRDDEFGQLLQTLIGHFYHADVWINSCERVVCSFCSCLGNCVK\n->Streptococcus_suis|ORF8 length 115 aa, 348 bp, from complement(4491..4838) of Streptococcus_suis\n-RLLMLSRSAKINSRLMVSISAIGSTKASSRPCSRMTFSSSKARTTWTIASTSRILAKNLF\n-PSPSPLEAPFTIPAISVNSKVVGTVFLGMMSSVNFCRRSSGTSTMPTFGSIVAKG\n->Streptococcus_suis|ORF9 length 192 aa, 579 bp, from 5663..6241 of Streptococcus_suis\n-GEKMTRLIIGLGNPGDRYFETKHNVGFMLLDKIAKRENVTFNHDKIFQADIATTFIDGEK\n-IYLVKPTTFMNESGKAVHALMTYYGLDATDILVAYDDLDMAVGKIRFRQKGSAGGHNGIK\n-SIVKHIGTQEFDRIKIGIGRPKGKMSVVNHVLSGFDIEDRIEIDLALDKLDKAVNVYLEE\n-DDFDTVMRKFNG\n->Streptococcus_suis|ORF10 length 1166 aa, 3501 bp, from 6235..9735 of Streptococcus_suis\n-RIMNILDLLHKNKQINQWQSGLNQSTRQLLLGLSGTSKSLIMATAYDCLAEKIMIVTATQ\n-NDAEKLVADLTAIIGSENVYNFFTDDSPIAEFVFASKERTQSRIDSLNFLTDSTSSGILV\n-ASIVACRVLLPSPETYKGSKIQLEVGQEIEVDKLVKNLVNIGYKKVSRVLTQGEFSQRGD\n-ILDIFDMQSETPYRIEFFGDEIDGIRIFDVDSQKSLENLDEISISPASDIILSSEDYSRA\n-SQYIQTAIEQSTLEEQQSYLREVLADMQTEYRHPDLRKFLSCIYEQSWTLLDYLPKSSPL\n-FLDDFHKIADKQAQFEKEIADLLTDDLQKGKTVSSLKYFASTYAELRKYKPATFFSSFQK\n-GLGNVKFDALYQFTQHPMQEFFHQIPLLKDELTRYAKSNNTVVIQASSDVSLQTLQKNLQ\n-EYDIHLPVHAADKLVEGQQQVTIGQLASGFHLMDEKLVFITEKEIFNKKMKRKTRRTNIS\n-NAERIKDYSELAVGDYVVHHVHGIGQYLGIETIEISGIHRDYLTVQYQNSDRISIPVEQI\n-DLLSKYLASDGKAPKVNKLNDGRFQRTKQKVQKQVEDIADDLIKLYAERSQLKGFAFSPD\n-DENQVEFDNYFTHVETDDQLRSIDEIKKDMEKDSPMDRLLVGDVGFGKTEVAMRAAFKAV\n-NDGKQVAILVPTTVLAQQHYANFQERFAEFPVNVDVMSRFKTKAEQEKTLEKLKKGQVDI\n-LIGTHRLLSKDVVFADLGLLVIDEEQRFGVKHKERLKELKKKIDVLTLTATPIPRTLQMS\n-MLGIRDLSVIETPPTNRYP'..b'\n-DTDTVMYSIIALMTITYIVNRMMSGTQSSRNVMIISQKSEEIKDYITKVADRGVTELPII\n-GGFTGVDKRMLMTTISIPEMQKLETAVLEIDETAFMVVMPASQVRGRGFSLQKDHKHYDE\n-DILIPM\n->Streptococcus_suis|ORF2902 length 565 aa, 1698 bp, from 1998923..2000620 of Streptococcus_suis\n-FQCNSLKIQVLSSTIKLIDRNRGETMLTVSDVSLRFSDRKLFDDVNIKFTAGNTYGLIGA\n-NGAGKSTFLKILAGDIEPSTGHISLGPDERLSVLRQNHFDYEDERVIDVVIMGNEQLYSI\n-MKEKDAIYMKEDFSDEDGVRAAELEGEFAELGGWEAESEASQLLQNLNISEDLHYQNMSE\n-LTNGEKVKVLLAKALFGKPDVLLLDEPTNGLDIQSINWLEDFLIDFENTVIVVSHDRHFL\n-NKVCTHMADLDFGKIKIFVGNYDFWKQSSELAAKLQADRNAKAEEKIKELQEFVARFSAN\n-ASKSKQATSRKKMLDKIELEEIIPSSRKYPFINFKSEREIGNDLLTVENLKVVIDGETIL\n-DNISFILRPGDKTALIGQNDIQTTALIRALMGDIEYEGTVKWGVTTSQSYLPKDNTRDFD\n-TNESILDWLRQFASKEEDDNTFLRGFLGRMLFSGDEVNKPVNVLSGGEKVRVMLSKLMLL\n-KSNVLVLDDPTNHLDLESISSLNDGLKAFKESIIFASHDHEFIQTLANHIIVISKNGVID\n-RIDETYDEFLENAEVQAKVQELWKA\n->Streptococcus_suis|ORF2903 length 115 aa, 348 bp, from complement(1999705..2000052) of Streptococcus_suis\n-PIRAVLSPGRRIKLILSRIVSPSITTFKFSTVKRSLPISRSDLKLINGYLRLEGMISSNS\n-ILSNIFLREVACLDLEALAEKRATNSCSSLIFSSAFALRSACSLAASSLDCFQKS\n->Streptococcus_suis|ORF2904 length 110 aa, 333 bp, from 1999974..2000306 of Streptococcus_suis\n-KLLLMVKRFLTISALSCAQVTRLLLLVKTTSKQLLSFVLLWAILNMKVLSSGVSLLVNPT\n-YQKTILVTLIQTNLSLIGSVNLPARKKMTIPSCAVSWDVCSSRVMRLTNL\n->Streptococcus_suis|ORF2905 length 117 aa, 354 bp, from 2000502..2000855 of Streptococcus_suis\n-QTISSSFLKTVLSTESTKLMMNSWKMLKYKQKYKNFGKHNKKRLGLLPSLSSQSSCQHLS\n-AVVDCQICSCFTLQIWPLRLLRTKFALSPTSNCLPDSLSCAGVGVKQSGNRLFQLNN\n->Streptococcus_suis|ORF2906 length 872 aa, 2619 bp, from 2000888..2003506 of Streptococcus_suis\n-PVKFFPTSFSFKSMKKIFTKTSIYYLLSFLIPLTIISIVLAFQGIWWGSDTTILASDGFH\n-QYVIFNQTLRNTLHGDGSLFYTFSSGLGLNFYALSSYYLGSFLSPIVFFFDLQSMPDAIY\n-LVTIVKFGLTGLSTYFSLKGIHKNLKEEWALLLATSFSLMSFSTSQLEINNWLDVFILLP\n-LVLLGLHRLLKKQGPILYYITLTCLFIQNYYFGYMVAIFLTLWTLVQLSWIDSQRIKRFI\n-NFTIVSILSALSSMFMLLPTYLDLKTHGETFTKIVNLKTEDSWYLDFFAKNLVGSFDTTK\n-FGSIPMISVGLVPLILALLFFTLKEIKPTVKLSYALFFTFIISSFYLQPLNLFWQGMHAP\n-NMFLYRYAWALSITVIYLAAETLVRLRQVSIKNFTLIVSFLLICFTSTFIFRDHYEFLTD\n-VNFLLTLEFLIAYFILFVAMIRYKSSLKWINIVLLFFTFLELGLHSHYQVQGISDEWHFP\n-SRSNYEEKLTDIDSIVKSTKTTTDSFYRIERLLPQTGNDSMKFNYNGISQFSSIRNRASS\n-SVLDKLGFRSDGTNLNLRYQNNTIIADSLFGVKYNLATTDPNKFGFTLNQSQSTINLYEN\n-SFNLGLALLTEGIYKDVNFTNLTLDNQTNFLNQLTGLSQKYYHTLSDVVSQNTVELSNRM\n-TVNKVDNEDAAKATFLVNIPANSQVYLNLPNLTFSNENQKKVVITVNNQSSEFTLDNAFS\n-FFNVGSFTTDVQVQVNVYFPENNQVSFDKPQFYRLDLLAFQQAISILQEKQVVTKTDGNK\n-VTVDFVTDKESSLLLTLPYDKGWNATIDGKPIKIQKAQDGFMKVDVSPGQTKLVLTFVPN\n-GFYLGLLISFGAVFVFFSYQFIGYYYSKNREY\n->Streptococcus_suis|ORF2907 length 235 aa, 708 bp, from complement(2003907..2004614) of Streptococcus_suis\n-FHVKQGVKMNQKEYRVFEGLRIACSLTFISGYLNAFTFVTQGGRFAGVQSGNVISLAYFL\n-AKGDFAQVVNFSIPILFFVFGQFFTYLARRYFEKQTWSWHFGSSVMMLVLILLTIILSPI\n-MPASFTIASLAFVASIQVETFRRLRGAPYANVMMTGNVKNAAYLWFKGVIEKDSELRKTG\n-RNILLTIIGFMLGVIISTHLSFQFEEYALIGLILPVLYINYELWQEKRPTRGRSK\n->Streptococcus_suis|ORF2908 length 180 aa, 543 bp, from complement(2004615..2005157) of Streptococcus_suis\n-PYPDFLKIFSVVCLWICYNYFMKIKLITVGKLKEKYLKEGIAEYSKRLGRFTKLDMIELP\n-DEKTPDKASQAENEQILKKEADRIMSKIGERDFVIALAIEGKQFPSEEFSQRISDIAVNG\n-YSDITFIIGGSLGLDSCIKKRANLLMSFGQLTLPHQLMKLVLIEQIYRAFMIQQGSPYHK\n->Streptococcus_suis|ORF2909 length 413 aa, 1242 bp, from 2005223..2006464 of Streptococcus_suis\n-VIIKKEIVLLRKIKEMERIPYMKKYLKFAILFVIGFFGGLIGALSASFFQPQVQQANSAI\n-TSVSNVQYNNETSTTKAVEKVQNAVVSVINYQKSANNSLGVIFGNIESSDELAVAGEGSG\n-VIYKKYGQYAYIVTNTHVINNAEKIDILLASGEKISGELVGSDTYSDIAVIKISADKVTA\n-VAEFADSDTIKVGETAIAIGSPLGSVYANTVTQGIISSLSRTVTSQSKDGQTISTNAIQT\n-DTAINPGNSGGPLINTQGQVIGITSSKITSSSANSSGVAVEGLGFAIPANDAVAIINQLE\n-KTGQVSRPALGVHMVNLTTLSTSQLEKAGLSNTELTSGVVIVSTQSGLPADGKLETFDVI\n-TEIDGEAIQNKSDLQSALYKHQIGDTITVTYYRNNQKQTVDIKLTHSTEELSE\n->Streptococcus_suis|ORF2910 length 256 aa, 771 bp, from 2006519..2007289 of Streptococcus_suis\n-GYMEELRTLNISEIHPNPYQPRIHFDEKELLELAQSIKENGLIQPIIVRKSSIIGYELLA\n-GERRLRASQLAGLTTIPAVVKELTDDDLLYQAIIENLQRSNLNPIEEAASYQKLISRGLT\n-HDEVAQIMGKSRPYISNLLRLLNLSSQTKQAVEEGKISQGHARQLVSFSEEKQAEWVQLI\n-LSKDLSVRTLEKLIAANKKKHTKLKQRDQFLKEQEDSLSKTLGTATKIIKKKNGSGEIRI\n-SFNDLDEFERIINNFK\n' |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.fasta --- a/test-data/get_orf_input.fasta Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,17 +0,0 @@ ->alpha three forward CDS using table 1 -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -NNNNNNNNNNNNNNNNATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNN -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -NNNNNNNNNNNNNNNNNTAANNTAGMNTGANNNNNNNNNNNNNNNNNNNNN ->beta three forward CDS using table 11 -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -NNNNNNNNNNNNNNNNNGTGNATANATTNNNNNNNNNNNNNNNNNNNNNNN -AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -NNNNNNNNNNNNNNNNNNTAANNTAGNNTGANNNNNNNNNNNNNNNNNNNN -TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t11_nuc_out.fasta --- a/test-data/get_orf_input.t11_nuc_out.fasta Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,36 +0,0 @@ ->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1 -ATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -GGGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTN -NNNNNNNNNNNNNNNNTAANNTAG ->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1 -ATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -GGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNN -NNNNNNNNNNNNTAA ->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1 -ATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNN -NNNNNNNNTAANNTAGMNTGA ->beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11 -GTGNATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -GGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNN -NNNNNNNNNNNNNNNNTAANNTAG ->beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11 -ATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -GGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNN -NNNNNNNNNNNNTAA ->beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11 -ATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGT -TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNNN -NNNNNNNNTAANNTAGNNTGA |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t11_open_nuc_out.fasta --- a/test-data/get_orf_input.t11_open_nuc_out.fasta Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,39 +0,0 @@ ->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1 -ATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -GGGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTN -NNNNNNNNNNNNNNNNTAANNTAG ->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1 -ATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -GGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNN -NNNNNNNNNNNNTAA ->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1 -ATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNN -NNNNNNNNTAANNTAGMNTGA ->beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11 -GTGNATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -GGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNN -NNNNNNNNNNNNNNNNTAANNTAG ->beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11 -ATANATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -GGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNN -NNNNNNNNNNNNTAA ->beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11 -ATTNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGT -TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNNN -NNNNNNNNTAANNTAGNNTGA ->beta|CDS4 length 25 aa, 75 bp, from 334..408 of beta three forward CDS using table 11 -NTGANNNNNNNNNNNNNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT -TTTTTTTTTTTTTTT |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t11_open_prot_out.fasta --- a/test-data/get_orf_input.t11_open_prot_out.fasta Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,20 +0,0 @@ ->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1 -MXXXXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGG -GGGFFFFFFFFFFFFFFFFXXXXXXXX ->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1 -MXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGGG -GVFFFFFFFFFFFFFFFFXXXXXX ->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1 -MXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG -FFFFFFFFFFFFFFFFFXXXXXXXXX ->beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11 -MXXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGG -GGVFFFFFFFFFFFFFFFFXXXXXXXX ->beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11 -MXXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGG -GFFFFFFFFFFFFFFFFFXXXXXX ->beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11 -MXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG -FFFFFFFFFFFFFFFFXXXXXXXXXX ->beta|CDS4 length 25 aa, 75 bp, from 334..408 of beta three forward CDS using table 11 -MXXXXXXXFFFFFFFFFFFFFFFFF |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t11_prot_out.fasta --- a/test-data/get_orf_input.t11_prot_out.fasta Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,18 +0,0 @@ ->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1 -MXXXXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGG -GGGFFFFFFFFFFFFFFFFXXXXXXXX ->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1 -MXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGGG -GVFFFFFFFFFFFFFFFFXXXXXX ->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1 -MXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG -FFFFFFFFFFFFFFFFFXXXXXXXXX ->beta|CDS1 length 87 aa, 264 bp, from 69..332 of beta three forward CDS using table 11 -MXXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGG -GGVFFFFFFFFFFFFFFFFXXXXXXXX ->beta|CDS2 length 84 aa, 255 bp, from 73..327 of beta three forward CDS using table 11 -MXXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGG -GFFFFFFFFFFFFFFFFFXXXXXX ->beta|CDS3 length 86 aa, 261 bp, from 77..337 of beta three forward CDS using table 11 -MXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG -FFFFFFFFFFFFFFFFXXXXXXXXXX |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t1_nuc_out.fasta --- a/test-data/get_orf_input.t1_nuc_out.fasta Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,18 +0,0 @@ ->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1 -ATGNATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -GGGGGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTN -NNNNNNNNNNNNNNNNTAANNTAG ->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1 -ATGNATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -GGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNN -NNNNNNNNNNNNTAA ->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1 -ATGNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA -AAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC -CCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG -TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNNNNNN -NNNNNNNNTAANNTAGMNTGA |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/get_orf_input.t1_prot_out.fasta --- a/test-data/get_orf_input.t1_prot_out.fasta Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,9 +0,0 @@ ->alpha|CDS1 length 87 aa, 264 bp, from 68..331 of alpha three forward CDS using table 1 -MXXXXXXXXXXXKKKKKKKKKKKKKKKKNPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGG -GGGFFFFFFFFFFFFFFFFXXXXXXXX ->alpha|CDS2 length 84 aa, 255 bp, from 72..326 of alpha three forward CDS using table 1 -MXXXXXXXXXXKKKKKKKKKKKKKKKKTPPPPPPPPPPPPPPPPRGGGGGGGGGGGGGGG -GVFFFFFFFFFFFFFFFFXXXXXX ->alpha|CDS3 length 86 aa, 261 bp, from 76..336 of alpha three forward CDS using table 1 -MXXXXXXXXKKKKKKKKKKKKKKKKKPPPPPPPPPPPPPPPPPGGGGGGGGGGGGGGGGG -FFFFFFFFFFFFFFFFFXXXXXXXXX |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/sanger-pairs-forward.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/sanger-pairs-forward.fastq Mon Jul 29 09:28:55 2013 -0400 |
b |
b"@@ -0,0 +1,288 @@\n+@WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+TTACCCGTCGGCGCCGAAAGAGCCGAAGGCTTTGTGACTGAGGCCGGACACTGTGCTGTTAAGCTGGACATTGCCCGACCTGTCGAGTGCGCCGCTCGCCGAAATTCGTTATCGCGTAAATTTATTTATTTATTTTTATTTTTTTAAATAAAAATGACGACTAATTTGTAAGGGCATAACAACAA\n++WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+!,,,./644,,,-0377<:Q777<BB<<60,+.,+,.4.,))))//15>>550007:66>>==7@71/--0:<CDBB;;49/***/***22,/+)))11===798:3.,,1488?133??BKKMODFB?BDB7447B?:8--.E:F?B77?BKKC<<322B:..<41,46>>B<<::::5116..\n+@WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+CGTGCCAGTTCTAAACTGGTCGTTCAGCGCCAACCGAAGTGCATACCCTGACGAGCATACACGCAGCTGAAGCGCTCCACAAGCAGCTCTCACCACTAGTCCACGCACCACCCCGCAAGGAGACGGCACGCAGCCACGGGCAAAAGCCGCCTGTTTCACACAACAGCCCGGCTGACCCGACCTTTAGAGCCAATTCTTTTCCCGAAGTTACGAATCTAATTTGCCGACTTCCCTTACCTACATTATTCTATCGACTAGAGGCTGTTCACCTTGGAGACCTGCTGCGGATATCGGTACGATCAGGCAGGAGATTCATATCGCTTCCCTCGCATTTTCAAGGGCCGTGTGGAGCGCACGAGACACCACAGGAACCGCGGTGCTTTACGGGCGCAACATCCCTATCTCAGGCTGAGCCACTTCCAGGCACGCACGCCCTAAACCAGAAAAGAGAACTCTGGCTCGGACTCCACACGACGTCTGCGAGTTCATTTGCGTTACCGCGCGAAACAGTTCTTGCGAACCGTCATTTCCCTGGCCTGGCGTGGGAATGTTAACCCACTTCCCTTTCGGCAACCGGATGGACAAACTGCGCAAGCACAGCAAAGTCTTCATCCGTAGTGTGTGACGGCATTAGCCGGTGC\n++WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+!<>AIHHCCCCCCCCIIIINNNNNTTTYYYYYYYYYYTTTTIIIIHHNIIIFDKFDDINNNTTTNIIIIINTTTTTTTYYYYYYTNNNNNTTYNIIIIIINNYYYYYYYYYYYYYYYYYTNNNNNTTTTTTYYYYYYYYYYYYYYYYYTLLJJJNNTTTTYYYYYYYYYTNNJNJLLTYYYYTONJJJOOYYYYYYYYYYYYYTTTTLOJJJJOOYYYYYYYYYTTTTTTYYYTTTTTTYYYYYYYYYYYYYYYYLJJJJJTYYYTLLLTOTJJJJJKKOYYYYTJNJJJOOTOOIIIILKYYYYTINDDDEEOSYYYYYYYYYYYYYYYYYYYYYYTTLTTTTTTTINIIIOYTKB888>>KMYYIIFIIITKYYYYKKKTOTYYYYYYYYYYYYYYYYYYYKIDDDD>>444>BKLKIIGGDIOYYYYIYYYQIIII@@7507>43--/<<IAAIIII>559==A@IIB>>===KMQM??/33?BIIQQIIFCCFCCFIIICIHA?@F>:>:>>=3...08AIIIMIQQQQCCCCQC:>=:6:>:>>IICA>>>>IFCCC>:>AA>99>;>AACAA>>>::7;7AIII>>>:>>IAI>833688949>@C>:>A;98777=;>99::>4755057132+\n+@WTSI_1055_1a09.p1kpIBF bases 1 to 497\n+CGAGCTCGGTACCCGGGGATCCCACCGTTTGGAGGGTGAATTCGCGCTGGAAAAAGGTTTTCCATGCAAAAAATGGAACTTCTTCAGCGTCCAAAGCTTTAGTCAGCCAGCAAAGTGTTGGCATTTCATCGAATGGAAATGGTTCAATAAGTAGCGGCAGCCCCAACGTTTTTGAGAAGTTTTGTGGCGTTTTCTCTGAAGGGGTAAAGTCAGGCGAATTGCTGGAAAAGGTGCCATTGGGTGATTTGGAAGTTGTTCTGTTGATGAACCTTTCATGTTCTAGGCGTTTGTGAAGGAATTTGCTGACAATTTGCTCCGAATCCAAAAGGACGTTGAGCGCTGTGATCGGACCATCAAATTCTATTCCAAACGGGACAATTTGGATGCTCTCCAACGGATAATTTGCACTTACATTTATCGTCGGCTGAAGTTGGACATTGAGGACGGTTACGTGCAGGGAATGTGCGATTTGGTCGCTCCTCTTTTGGTGGTGTTT\n++WTSI_1055_1a09.p1kpIBF bases 1 to 497\n+!989>>CCCCCCIIYOICCCCOIYHHA8339>><@75.444N@IDHHHDDNTTYYYYYTTTIIIIINYYYYYTTTTTTTNNNHHHIHHIIIIOQIDKDDDFHIIITYYYYYYYTTTTTYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTTNNNNNNTTTYYYYYYYTTTTYTYYYSSSYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOOKJJNOTTYYYYYYYTTTTTTTTTTYYYYYYYYTTTTTTYYYYYYTTNNLLLLLLYYMOKKKOYYYYYYYYYYYYYYYYYYTTTTTIIIIIITYYLIIIIIFFDDDFYYYYYYYTTTTTTYYYYYYQQMMMYYTTOKKKIIIIIIIKKNNNDDDNNNNTYTTOOKKKINNIIKQONN?N2::NHTQOKKKKFFFFFFMMIIIICBAAIII>>>>>>AAAB=?FBO>88+,+//><IIII<33/++/0<<4\n+@WTSI_1055_1a10.p1kpIBF bases 1 to 512\n+AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACGGCAAGAGACCAATCTGGTTTTGCAATGTAACATGCCAATTAATCATCAGCATTTTTCACATAAGTGATGGGATGACGGTTGGGGGGGGGGGAAATAAATGCATGTCGATCAGTGCATAGAAGCGAAAGAAATCGTAGAAATTTGCAGATGAAAATTTTGCAGTGGTAATTTGACCGTACCGAAAAGGAATGAGAGCTATTTACCTGTGGGAATGGGTGTAAAATGGAAACTAAATTGCGCGAGGGACAGTTTTGATTGGACGATATCTCCAGCGCAAAGGTCACATGACCAGCCGCTTGGAGATTGTTCGGGTAAGCGAGACAAAATACGAACAATCGGAGTTATTTGTACAACAACAACACATTGATTAAGTGATGGGAGAAAAAAAAAAGAAGGAATAATATGGCTTTGTGCATTTTTCTAAAGGTCTTAAAAATCAA\n++WTSI_1055_1a10.p1kpIBF bases 1 to 512\n+!.6<:::60.1441+21441++AAAAEHHHHHHHHHHDBB4+,+<<IDCCCCCCCCCHITIIDDDCOOQH@@//)))059><10''*45EHMOFEDCCCCCDIIINTTIINNNNTTTTTTYYYYTIIIDDDDDHHHHHNOKKKKMOOTINNNNNYYYYQPPPPKKKLOKMMMKIINIIIKIIIIIFKIIIITOYSSYYYYYTTTLOKKKKKYYYYYYKLMMOOMSSYSLOKKFBBBFKKKSSYYYSSMMSSYYYSSSSMSSSSSMYYYYMOKKKKSSYYSKKKKKKSYYYPSSSSMMFIIOJJSYYYSSSMLOLIIIIIIYYYLLTLOIIIFFFKKMYYYYYYYYYTTTTTOOKKIINNNNTYYYYYYOFFFFFFIOYYYYYYYYYYQQKKKKKMMTTTTYYYIIIFFFFFFFDMMQQYYKKKKKKMKKKQQYQOKKKMOYYYA;777;>CIIIH@>>CA=94++,69ICCCC@>>743323::@@BIMII"..b"ATCTTTCGCCACTTCCCGCCTCCCCCCCCCCTTTTGACCACCTGCCATTGTTGTCGTTGAGCAACCGAATTTGACTCTTCACCCGTCGACTGCTGGGCGTTCGCTGTTCCGCCATGAATTGGCGCCATTCTCTTTGGCCCTAAAAGTGAACCGGTTACCAACTACTAAAGTGTCCGATTCGCTCCCGAACCTGCCGAGTCTGGACAGAGGCCGGAATTTTTGGGAATGCCATCAATCCCGGAGCATTTTTGAAGCTGCTCTCGACATGAGTACCGGCTCCATTAAAATTATCCCCCTCCAAACCGACCACAATCACACGCCCCCACTCGTCCCTGCGCAACGTCGTCTCTTCGTCGTCCACCTCCGCCTCGTCCGTTCTCGCCCATTCCCTTTTCTCGTC\n++WTSI_1055_1f20.p1kpIBF bases 1 to 491\n+!89><<<536::6001:41--<A?>CCCFCDDDDIIIYQKKGGNNNDCCCCCDDDDDHTNIDDDDTTIDA>9449;>@DHHHHHINNNNHDEEFHHNNNIIIIIIYYTIIIIITYYYYTTTTNNTTTTTIIIIIIFF>>2...@NNNTTTTTYYYYYYYYYYTTTTTNNLTTNYYYTTNNNLLLTTTTTTTTTTTTTTTYYYYTTTTTTYTNNNNILLNNNNNNNTTTTTTTTTTTTTTTYYYTTLTTTTTTTTTTTTTYYYYYYYYYYYYYYYYYTTTTTTYYTTTTTTYYTTNNNNIIKYYKKTTTTYYYYTTTTTTYYYYYYYYYYKKKKKKYYYYYTTTTTTYYYYIIIIBB=>7<<>>>CII??36-1(((()*+48ACIAA?4/)))'/***,++,539<>>>>>BD777777>>>>>>>>91/))01<::8=891,*117444,+,12777.,+44>440/0977-//-10++048:30---+\n+@WTSI_1055_1f21.p1kpIBF bases 1 to 456\n+TAAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACCTGGAGCAAACTGGTTGTGTCGTGGTCAGGGTACCGCCATTCCGTGAGATATGGTAGGTAAATGCGACCGGGATTATCCACAACTTTGGACGGCCTAATTCGCATACATGGAGTCGGCTTCACATAGCAATAGGGGCCTACGTTGGGATGATTTTCCAGAAAGTAAATGGCTACGGGAATGTTGTACACAGCTCCCTTAAGCTTTATGTATTAAACAAACAAACAAAGACCATACAGCCCACCTTATACAAGATGGGAATGGTCCCCGAAAAGGAAAGGCAATATTTCGGCATTCCGTCAGGGAAAACAAAATTCACAACGTCGGGCTGAAGATCTATAAAATTGTTGAGCGCAGTGAGTAAATCATCCTTCGTACTATCCTC\n++WTSI_1055_1f21.p1kpIBF bases 1 to 456\n+!.348<<<<<4014:3.08::;<<ECCCIIIHCCBCCCDIYMMKKBNNNHDDDDDDDDDINYOIDDHHTTIDDAA<<<>BDDDDDDDDDIIHHHHIINNNIFDHHHIINIFFIINITTKFFIIIIIIIIIIIIIOOMMQQ8.))*25IHMQQQIIIIIIIIIITNNNNIIKYYYTTTTTTTTTTTTYNNIIIINNTTTTTNNIIITTTTTTTTNNNNTTYYYYYTTTTTTYYYTTTTTNTTTTTYYKKFFKKYYYYYYYYYTTTTTTTTTTTTYTTTTTTYTTTTTTTTLIFDDFJJJFFFIIJLOKFFMSSSYYYYYSSFB;??IIKKKKKKKKKLLKFFDDDDMDDDDB;789;AFNDBB;;BOMMMKKIDDDED@D@@8=@ENEBBBBBD;85//6?@@>77<@DFM?82228>D>>77273BB==97330/.--/8@75-,,/,,,0/53,\n+@WTSI_1055_1f22.p1kpIBF bases 1 to 370\n+CGACCAATGCTCGGTCCGTCACGTAGAGCAATCCGTTTGAGCGATCCACACGAAAATCTTAAGCGCAAAAAAGATTAATATTAATTATTTAACCATCTAATTATTTTAAAAATTTGCCGAAATAGTATCCGATCAAATCGGTTCTGACAATTTTACATTATCTGTTAGCCGTGCCAAAGTCTCTCTCTCACATTCGGTGGCAGCCGGTTGTCGTTGTCCAAGCACAAATTCTACGCTGCCATTATTGCCTTCGTCTCTGTCGCGTGCCAAAAAGCGTCCGATGGCGGTGCCAGCCGGCATATTGTCCAGTAGCCGAATGTGCGTGTCCTGGCGATCCCACAGGATCAGTGGCCGATTATCATTTTTGTC\n++WTSI_1055_1f22.p1kpIBF bases 1 to 370\n+!89A>887>>:>68>AHHIIDCCCCCDNNYYTTTTTTTYTTTTTTYYYYYYYYYYNNHHHDF=@=>9BQQYYYIIIIIITTTTTTTTTTTTTNTTNNNNNTTTTTTTTTTYYYYYYYYYYTTTTTTYTTYTTTYTNNNNLNNNNNNNTLLYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTTTOOKKKOYYYYYYYYYYYYYYYYYYYYYYTTTTLKKTTTTYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYYYYYYKKMMTTTTTTTYOKIIIGKKYYYYYOIIIOAQ==<:77:<IIIABBBCDO>>988>?FKYYPFBB,,.8>FAA:6698<>>D>>::33:4>>66,,,<<Q93+-\n+@WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+CAAATCCTACTGGCCGGACAAAAGAAGCGGCCAAACAACGTGCTCTTCACAAGACGATCACCACCAAAAACATTCACACATGCTCAACGAGACATTGCTTGCAGGATGGCAAGTGCAGGAAGCACTTTCCGGTGCATTAGTTTACACTGACTATGTAACCTATTGTTAATTCCCTGTAGAAACCGTTTGAGTACGACACTGTGTACTCTGAAAATGCCTACCCTCGCTACAAGCGCCGCCCACCTCCGCCTTCACTCCAAGAAGCCCAGCAGAGTCCGGAATTATACGGGCGCGAAATGCAATACAAGGACCAGCGTGGCAAACTAATTCGCAAGGACAACTCTCACGTCGTGGCTTTCAGTCCATTTCTGTCAAGCAAATATGTCGCTCAGTAAAATTAATACTTTTTGTGACAAAATTGCTAACTTTTTTGCAGCATTAACGTCGAGTTTGTCGCGGGAGAAGGATGTATAAAGTACTTATGCAAGTACATGATGAAAGGAGCGGACATGGCCTTTGTCCAAGTCACGGATGCCAACACGGGCCAAAGTGCGCTGAACTACGACGAACTGCAGCAAATTCG\n++WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+!333;>HCDHHIIIYIIINTTYYYYTTTTTTYYYYYYNIIIIIININNTONB81+++04HQYTTTTTTTNIIINNTTNTTTTTTTTYYYTTTTTYTTTTTTYYYYYYYYYTTTTTTYYYYYTIIIIIITTTTTTTTTNNNNNNTNNTTTNNNNNNNNNNNNNNNNTTTTTYYTNNJJJJLYYYYYYYYYTTTTTTYTNNNNNNTYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTNNNNNNTTYYYTNNNNNTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYTKKKTNNIIINTYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYTTTTTTOIICBBOQQQQQQC;<88:>>>CIFOYYYYYYQQQQQQQQQCCQQQQHCBAA:AAAAIIA>;A>AAAIC>>AAAACA>>>>III>::>AAACCCIIIA:;==<IIIIIQQAA<:::IA==::8::CQIIIIAA>>CI92\n" |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/sanger-pairs-interleaved.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/sanger-pairs-interleaved.fastq Mon Jul 29 09:28:55 2013 -0400 |
b |
b"@@ -0,0 +1,576 @@\n+@WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+TTACCCGTCGGCGCCGAAAGAGCCGAAGGCTTTGTGACTGAGGCCGGACACTGTGCTGTTAAGCTGGACATTGCCCGACCTGTCGAGTGCGCCGCTCGCCGAAATTCGTTATCGCGTAAATTTATTTATTTATTTTTATTTTTTTAAATAAAAATGACGACTAATTTGTAAGGGCATAACAACAA\n++WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+!,,,./644,,,-0377<:Q777<BB<<60,+.,+,.4.,))))//15>>550007:66>>==7@71/--0:<CDBB;;49/***/***22,/+)))11===798:3.,,1488?133??BKKMODFB?BDB7447B?:8--.E:F?B77?BKKC<<322B:..<41,46>>B<<::::5116..\n+@WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+TGATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGGTACCCGACGTCCGATATCGCGAAAAATGATGTATCTAGATTTGTCAGGAAACGTCCCCGAGTCTGTTCGACAAACAAACGTTATTCCGAACTCCCAACAACAGTATTTGATTGTGTAAAAATCTCTTGGCCTGATTACTATACTTTAGACATTTTTAGTGCCTGTATTGGAGGTATTTTAGGAACTTTTGGAACGAGCTTTTATCGATTTAGGGAACTAAAAAACCGTTCCATATTCATTAGATGCTATTATTTAAAATCCGAGTCTGATTTGCGAT\n++WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+!41>;D>AA>;;=;;>>AA@@CDDAA>>>ADINIIHHDD>::79:>>FIICCCHHHHCCCCCCCCCHHHHIEA>9..''))**,,++''+)**.,,,-,00..0B+..33010701+++-1B1.,??KMOYYQQQQ<<61,))01<:CAIIIIIYYYYTYTTTTYYYYYTTTTNNKKKKYYYYYYYYYYYYPMMOKTTTTYTTTTTYNINNINTNTIIIIIIIIINNYYYYYYYTTOLKKKIIIINNNOKKKKKFFKKYYYYYYYYYYSSMMMQMYYYYYTTTTLLPIDDDDDDFFFFFFMMKKLNIDFFKQQMMMMMMMMHHFF>A>>:779=5<488>>7745/00::300+++0-\n+@WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+CGTGCCAGTTCTAAACTGGTCGTTCAGCGCCAACCGAAGTGCATACCCTGACGAGCATACACGCAGCTGAAGCGCTCCACAAGCAGCTCTCACCACTAGTCCACGCACCACCCCGCAAGGAGACGGCACGCAGCCACGGGCAAAAGCCGCCTGTTTCACACAACAGCCCGGCTGACCCGACCTTTAGAGCCAATTCTTTTCCCGAAGTTACGAATCTAATTTGCCGACTTCCCTTACCTACATTATTCTATCGACTAGAGGCTGTTCACCTTGGAGACCTGCTGCGGATATCGGTACGATCAGGCAGGAGATTCATATCGCTTCCCTCGCATTTTCAAGGGCCGTGTGGAGCGCACGAGACACCACAGGAACCGCGGTGCTTTACGGGCGCAACATCCCTATCTCAGGCTGAGCCACTTCCAGGCACGCACGCCCTAAACCAGAAAAGAGAACTCTGGCTCGGACTCCACACGACGTCTGCGAGTTCATTTGCGTTACCGCGCGAAACAGTTCTTGCGAACCGTCATTTCCCTGGCCTGGCGTGGGAATGTTAACCCACTTCCCTTTCGGCAACCGGATGGACAAACTGCGCAAGCACAGCAAAGTCTTCATCCGTAGTGTGTGACGGCATTAGCCGGTGC\n++WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+!<>AIHHCCCCCCCCIIIINNNNNTTTYYYYYYYYYYTTTTIIIIHHNIIIFDKFDDINNNTTTNIIIIINTTTTTTTYYYYYYTNNNNNTTYNIIIIIINNYYYYYYYYYYYYYYYYYTNNNNNTTTTTTYYYYYYYYYYYYYYYYYTLLJJJNNTTTTYYYYYYYYYTNNJNJLLTYYYYTONJJJOOYYYYYYYYYYYYYTTTTLOJJJJOOYYYYYYYYYTTTTTTYYYTTTTTTYYYYYYYYYYYYYYYYLJJJJJTYYYTLLLTOTJJJJJKKOYYYYTJNJJJOOTOOIIIILKYYYYTINDDDEEOSYYYYYYYYYYYYYYYYYYYYYYTTLTTTTTTTINIIIOYTKB888>>KMYYIIFIIITKYYYYKKKTOTYYYYYYYYYYYYYYYYYYYKIDDDD>>444>BKLKIIGGDIOYYYYIYYYQIIII@@7507>43--/<<IAAIIII>559==A@IIB>>===KMQM??/33?BIIQQIIFCCFCCFIIICIHA?@F>:>:>>=3...08AIIIMIQQQQCCCCQC:>=:6:>:>>IICA>>>>IFCCC>:>AA>99>;>AACAA>>>::7;7AIII>>>:>>IAI>833688949>@C>:>A;98777=;>99::>4755057132+\n+@WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+CTGTGTACAAAGGGCAGGGACGTATTCAGAGCGAGTTGATGACTCGCCCCTACAAGGAATTCCTCGTTCACGGACAATAATTGCAATGTCCGATCCCAATCACGGCAAATTTTCACCGGTTTACCAACCCCTTTCGGGGAAGGACAAGCACGCTGATTTTGCCAGTGTAGCGCGCGTGCAGCCCCGGACATCTAAGGGCATCACAGACCTGTTATTGC\n++WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+!>>>>>DDIFKOOTTTNDDDHHFTTOOKKKYYTTNNNIYYNNNNNNYTIIIIITIFNIDDKKKNNIIIFIITTTTNNNNNINIINGIKMYYYYYOTTTTTYKKLMMMYYYQOOAAAAIQ;7:<<<A>=AAQA>><<<>7::77::7>>IIIAAAA>:>A=>>5:88::=BIIIIIIIII>>7;9733999=8370---128999::14.,0,,0442+\n+@WTSI_1055_1a09.p1kpIBF bases 1 to 497\n+CGAGCTCGGTACCCGGGGATCCCACCGTTTGGAGGGTGAATTCGCGCTGGAAAAAGGTTTTCCATGCAAAAAATGGAACTTCTTCAGCGTCCAAAGCTTTAGTCAGCCAGCAAAGTGTTGGCATTTCATCGAATGGAAATGGTTCAATAAGTAGCGGCAGCCCCAACGTTTTTGAGAAGTTTTGTGGCGTTTTCTCTGAAGGGGTAAAGTCAGGCGAATTGCTGGAAAAGGTGCCATTGGGTGATTTGGAAGTTGTTCTGTTGATGAACCTTTCATGTTCTAGGCGTTTGTGAAGGAATTTGCTGACAATTTGCTCCGAATCCAAAAGGACGTTGAGCGCTGTGATCGGACCATCAAATTCTATTCCAAACGGGACAATTTGGATGCTCTCCAACGGATAATTTGCACTTACATTTATCGTCGGCTGAAGTTGGACATTGAGGACGGTTACGTGCAGGGAATGTGCGATTTGGTCGCTCCTCTTTTGGTGGTGTTT\n++WTSI_1055_1a09.p1kpIBF bases 1 to 497\n+!989>>CCCCCCIIYOICCCCOIYHHA8339>><@75.444N@IDHHHDDNTTYYYYYTTTIIIIINYYYYYTTTTTTTNNNHHHIHHIIIIOQIDKDDDFHIIITYYYYYYYTTTTTYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTTNNNNNNTTTYYYYYYYTTTTYTYYYSSSYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOOKJJNOTTYYYYYYYT"..b'IINNHDFKOOOKKMQMMPPYYYTTTTTTYTTTTNNNNNNKFCCQQYYMMFF<<79?A8335:<:6-2+++\n+@WTSI_1055_1f22.p1kpIBF bases 1 to 370\n+CGACCAATGCTCGGTCCGTCACGTAGAGCAATCCGTTTGAGCGATCCACACGAAAATCTTAAGCGCAAAAAAGATTAATATTAATTATTTAACCATCTAATTATTTTAAAAATTTGCCGAAATAGTATCCGATCAAATCGGTTCTGACAATTTTACATTATCTGTTAGCCGTGCCAAAGTCTCTCTCTCACATTCGGTGGCAGCCGGTTGTCGTTGTCCAAGCACAAATTCTACGCTGCCATTATTGCCTTCGTCTCTGTCGCGTGCCAAAAAGCGTCCGATGGCGGTGCCAGCCGGCATATTGTCCAGTAGCCGAATGTGCGTGTCCTGGCGATCCCACAGGATCAGTGGCCGATTATCATTTTTGTC\n++WTSI_1055_1f22.p1kpIBF bases 1 to 370\n+!89A>887>>:>68>AHHIIDCCCCCDNNYYTTTTTTTYTTTTTTYYYYYYYYYYNNHHHDF=@=>9BQQYYYIIIIIITTTTTTTTTTTTTNTTNNNNNTTTTTTTTTTYYYYYYYYYYTTTTTTYTTYTTTYTNNNNLNNNNNNNTLLYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTTTOOKKKOYYYYYYYYYYYYYYYYYYYYYYTTTTLKKTTTTYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYYYYYYKKMMTTTTTTTYOKIIIGKKYYYYYOIIIOAQ==<:77:<IIIABBBCDO>>988>?FKYYPFBB,,.8>FAA:6698<>>D>>::33:4>>66,,,<<Q93+-\n+@WTSI_1055_1f22.q1kpIBR bases 1 to 496\n+CTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCGCATGAGGAATCGGAAGAGAATAATAACAAGAAAATGACAGATAAAAAGAGTGGAATTGAAGTAGAAGAGAAAAAGGGTAGAGTTGTAACAGAAGAGAAGAAAGTTTTAAATGAAGCGGAAGAAAAGAAGGACGAAGATCAGACGGAAGAGAAGAAAGAAAATGAAAAAGAAGTTAAAAGAAATAATGCGGAAGAGAAGAAGAAATTGGATGAAACTGAAGAGAAGCCGGATGAGGAAAGGGGAGAAAAGAAGAGCAGAGCTGAAGTGGAATTGGAAGAAACAACGAAGAAGAATAATGGACTTAAATATGTTTGGAAGCATCAAAATGAATCGGATGTAAAGAAGTACGAAAACATAATGGAAAGTATGGACGAAAAGAAAATGGAAGAGAAGGAGCTCGTGGACAATTACAGTAATATTTTGTTTGGAA\n++WTSI_1055_1f22.q1kpIBR bases 1 to 496\n+!399>>>>CHHHHBDDDEIIINNTIIFDA>AAAADDDDDDDDDHHHDDHDIIIIIINNNOOBB+++89DFIKKFFINNTTYYYTTTLLLKKKOOTTOLYLLOLTTTTTTTYYYYYYYYYYYYYTIIIDDDFFKOTYYYYYYYYYYYYYTTTLLJTTTYYYYYYYYYYYYTTTNJJLTTLLTTTTYYYYYYYYYTNNNNNTLLMKNNNNNNTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTLLKKKYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTNNNNJJLNNNNNNNNNTTTTTTNNNNNTYTNNNLNNNTTTTTNNLLTTTTTTTTYYYYYYYYYTTNLLLLLLNNNTLYYYYYYYYYYYYYYYTTTTTTYYYYYYYTNNNNNTTTNNNILOOTINNNNNTTTTMYMMMYIIINFFIIIGINIIIIKLLTOKKKMGGDFFFGFFFFFFFFFNNNIN?CCMQ<<3<<D<<+,.66>>F=;>:5.\n+@WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+CAAATCCTACTGGCCGGACAAAAGAAGCGGCCAAACAACGTGCTCTTCACAAGACGATCACCACCAAAAACATTCACACATGCTCAACGAGACATTGCTTGCAGGATGGCAAGTGCAGGAAGCACTTTCCGGTGCATTAGTTTACACTGACTATGTAACCTATTGTTAATTCCCTGTAGAAACCGTTTGAGTACGACACTGTGTACTCTGAAAATGCCTACCCTCGCTACAAGCGCCGCCCACCTCCGCCTTCACTCCAAGAAGCCCAGCAGAGTCCGGAATTATACGGGCGCGAAATGCAATACAAGGACCAGCGTGGCAAACTAATTCGCAAGGACAACTCTCACGTCGTGGCTTTCAGTCCATTTCTGTCAAGCAAATATGTCGCTCAGTAAAATTAATACTTTTTGTGACAAAATTGCTAACTTTTTTGCAGCATTAACGTCGAGTTTGTCGCGGGAGAAGGATGTATAAAGTACTTATGCAAGTACATGATGAAAGGAGCGGACATGGCCTTTGTCCAAGTCACGGATGCCAACACGGGCCAAAGTGCGCTGAACTACGACGAACTGCAGCAAATTCG\n++WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+!333;>HCDHHIIIYIIINTTYYYYTTTTTTYYYYYYNIIIIIININNTONB81+++04HQYTTTTTTTNIIINNTTNTTTTTTTTYYYTTTTTYTTTTTTYYYYYYYYYTTTTTTYYYYYTIIIIIITTTTTTTTTNNNNNNTNNTTTNNNNNNNNNNNNNNNNTTTTTYYTNNJJJJLYYYYYYYYYTTTTTTYTNNNNNNTYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTNNNNNNTTYYYTNNNNNTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYTKKKTNNIIINTYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYTTTTTTOIICBBOQQQQQQC;<88:>>>CIFOYYYYYYQQQQQQQQQCCQQQQHCBAA:AAAAIIA>;A>AAAIC>>AAAACA>>>>III>::>AAACCCIIIA:;==<IIIIIQQAA<:::IA==::8::CQIIIIAA>>CI92\n+@WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+TATGACTGATTACGCCAGCTATTTAGGTGAGACTATAGAATACTCACGCTAGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGATTGCTTTTTGGCTCGCATACTGCAGCCTGGGGAAGTAGTTGACGTTTTGAAGAATTGAGGGAAGTTGACGTGAAACGGCAACGCGGAGCAGGTCGGAAATCGCTTCGCTATCAGAGCCAAGCAACGAAATGGCGATTGCGCTTAAAAAACATTGGTTTGCTTAAAACATCAATGGTCTTCACCGGTAGAAGCAGTCGCCTAGACCAACGTTGTTGACGCAACGAATGGTGTTTTGCTGCTGGGCAGACGTGGGCGGAGTGCTA\n++WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+!..+---77CBI>7---77>>>DACCCHHHIDDDDCCIHHAA84)))%%%))+,32>>HHHHCCCCCCCCCHIIIIINN<B.,,,+++2.22OBNDHHHHHIIDDDDIIYTNNNNNTTTIIIIIITTTTKKYYYYYYYYYYQOB84-,,.<>FIIIIINNNIIIKKMSSSIIIIIIIIIIIILTOOIIIIIFLLLLLLYYSKKLKKKPMSSYSYSSMSS?KKKKFFFIIFKKKKKKKKSMMMSKKIDDDKKKFDDFFFBBDD=DDMMMKDDDDDDKKFFCCKKKKKFFFKKKKFMMMMMKKKKKKKK734:4B<??B@DC=<871<1314/--,,+++++.-5:97--,\n' |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/sanger-pairs-mixed.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/sanger-pairs-mixed.fastq Mon Jul 29 09:28:55 2013 -0400 |
b |
b"@@ -0,0 +1,800 @@\n+@WTSI_1055_1a03.p1kpIBF bases 1 to 312\n+TTGTTGAACAGCAAAAAGGTCAAGAATATGGATGTTCTCGCCATGATTTTTGTGCCATAGGCGCGCATTCACAAGGTCCATCAGTCGNTCAGCCTGCCGCAACACCACCACCAGCCGCAGCAACAACAACAGCACCAGCAGCAGCTGATCCAATCGCATGTGCCACAGAATAACACCCAAAATCAATTAGCGACGGCCGCCCTCCAGCCGGTTCAGCAGCAGAAACAGCACGAAAAATGGGATCCGATCAAAGAATTTGGGCTGCAAAAGGACGAAATGGCGTTGAAGTCACCGCCCAGCAATGTTTGTGT\n++WTSI_1055_1a03.p1kpIBF bases 1 to 312\n+!96CBHOOTTTYYYQMK???OOTYTTTNNNYYYYNIIIFFIIIIIIIYOOOMAA62.((((*,9@MIIIIO?A3007OOOMMII::%%%::AEHIIIQYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTOOKKKKKYMMYYYKIINNNTYYNIIIINYYYYTOLKKKOOKKKKOLTTYYYYSSSSYYYYSSSSSSMMSOOTLLLONIDDDNOTTYQQMMMMPBB9>BDOOTTQMMMMQMMMQQE:666QQYYPMMDDDADDM@B<FDBBDKKKKKKKKIGKINIFFFKDGGIDB?2/\n+@WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+TTACCCGTCGGCGCCGAAAGAGCCGAAGGCTTTGTGACTGAGGCCGGACACTGTGCTGTTAAGCTGGACATTGCCCGACCTGTCGAGTGCGCCGCTCGCCGAAATTCGTTATCGCGTAAATTTATTTATTTATTTTTATTTTTTTAAATAAAAATGACGACTAATTTGTAAGGGCATAACAACAA\n++WTSI_1055_1a04.p1kpIBF bases 1 to 186\n+!,,,./644,,,-0377<:Q777<BB<<60,+.,+,.4.,))))//15>>550007:66>>==7@71/--0:<CDBB;;49/***/***22,/+)))11===798:3.,,1488?133??BKKMODFB?BDB7447B?:8--.E:F?B77?BKKC<<322B:..<41,46>>B<<::::5116..\n+@WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+TGATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGGTACCCGACGTCCGATATCGCGAAAAATGATGTATCTAGATTTGTCAGGAAACGTCCCCGAGTCTGTTCGACAAACAAACGTTATTCCGAACTCCCAACAACAGTATTTGATTGTGTAAAAATCTCTTGGCCTGATTACTATACTTTAGACATTTTTAGTGCCTGTATTGGAGGTATTTTAGGAACTTTTGGAACGAGCTTTTATCGATTTAGGGAACTAAAAAACCGTTCCATATTCATTAGATGCTATTATTTAAAATCCGAGTCTGATTTGCGAT\n++WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+!41>;D>AA>;;=;;>>AA@@CDDAA>>>ADINIIHHDD>::79:>>FIICCCHHHHCCCCCCCCCHHHHIEA>9..''))**,,++''+)**.,,,-,00..0B+..33010701+++-1B1.,??KMOYYQQQQ<<61,))01<:CAIIIIIYYYYTYTTTTYYYYYTTTTNNKKKKYYYYYYYYYYYYPMMOKTTTTYTTTTTYNINNINTNTIIIIIIIIINNYYYYYYYTTOLKKKIIIINNNOKKKKKFFKKYYYYYYYYYYSSMMMQMYYYYYTTTTLLPIDDDDDDFFFFFFMMKKLNIDFFKQQMMMMMMMMHHFF>A>>:779=5<488>>7745/00::300+++0-\n+@WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+CGTGCCAGTTCTAAACTGGTCGTTCAGCGCCAACCGAAGTGCATACCCTGACGAGCATACACGCAGCTGAAGCGCTCCACAAGCAGCTCTCACCACTAGTCCACGCACCACCCCGCAAGGAGACGGCACGCAGCCACGGGCAAAAGCCGCCTGTTTCACACAACAGCCCGGCTGACCCGACCTTTAGAGCCAATTCTTTTCCCGAAGTTACGAATCTAATTTGCCGACTTCCCTTACCTACATTATTCTATCGACTAGAGGCTGTTCACCTTGGAGACCTGCTGCGGATATCGGTACGATCAGGCAGGAGATTCATATCGCTTCCCTCGCATTTTCAAGGGCCGTGTGGAGCGCACGAGACACCACAGGAACCGCGGTGCTTTACGGGCGCAACATCCCTATCTCAGGCTGAGCCACTTCCAGGCACGCACGCCCTAAACCAGAAAAGAGAACTCTGGCTCGGACTCCACACGACGTCTGCGAGTTCATTTGCGTTACCGCGCGAAACAGTTCTTGCGAACCGTCATTTCCCTGGCCTGGCGTGGGAATGTTAACCCACTTCCCTTTCGGCAACCGGATGGACAAACTGCGCAAGCACAGCAAAGTCTTCATCCGTAGTGTGTGACGGCATTAGCCGGTGC\n++WTSI_1055_1a05.p1kpIBF bases 1 to 642\n+!<>AIHHCCCCCCCCIIIINNNNNTTTYYYYYYYYYYTTTTIIIIHHNIIIFDKFDDINNNTTTNIIIIINTTTTTTTYYYYYYTNNNNNTTYNIIIIIINNYYYYYYYYYYYYYYYYYTNNNNNTTTTTTYYYYYYYYYYYYYYYYYTLLJJJNNTTTTYYYYYYYYYTNNJNJLLTYYYYTONJJJOOYYYYYYYYYYYYYTTTTLOJJJJOOYYYYYYYYYTTTTTTYYYTTTTTTYYYYYYYYYYYYYYYYLJJJJJTYYYTLLLTOTJJJJJKKOYYYYTJNJJJOOTOOIIIILKYYYYTINDDDEEOSYYYYYYYYYYYYYYYYYYYYYYTTLTTTTTTTINIIIOYTKB888>>KMYYIIFIIITKYYYYKKKTOTYYYYYYYYYYYYYYYYYYYKIDDDD>>444>BKLKIIGGDIOYYYYIYYYQIIII@@7507>43--/<<IAAIIII>559==A@IIB>>===KMQM??/33?BIIQQIIFCCFCCFIIICIHA?@F>:>:>>=3...08AIIIMIQQQQCCCCQC:>=:6:>:>>IICA>>>>IFCCC>:>AA>99>;>AACAA>>>::7;7AIII>>>:>>IAI>833688949>@C>:>A;98777=;>99::>4755057132+\n+@WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+CTGTGTACAAAGGGCAGGGACGTATTCAGAGCGAGTTGATGACTCGCCCCTACAAGGAATTCCTCGTTCACGGACAATAATTGCAATGTCCGATCCCAATCACGGCAAATTTTCACCGGTTTACCAACCCCTTTCGGGGAAGGACAAGCACGCTGATTTTGCCAGTGTAGCGCGCGTGCAGCCCCGGACATCTAAGGGCATCACAGACCTGTTATTGC\n++WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+!>>>>>DDIFKOOTTTNDDDHHFTTOOKKKYYTTNNNIYYNNNNNNYTIIIIITIFNIDDKKKNNIIIFIITTTTNNNNNINIINGIKMYYYYYOTTTTTYKKLMMMYYYQOOAAAAIQ;7:<<<A>=AAQA>><<<>7::77::7>>IIIAAAA>:>A=>>5:88::=BIIIIIIIII>>7;9733999=8370---128999::14.,0,,0442+\n+@WTSI_1055_1a07.p1kpIBF bases 1 to 574\n+AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACCGGTACGGAGGGAAATTTGATCAT"..b'GATGCTTCAACGAAAACTGATCAGGCGAACTGAAAGGGTGTAAAAAAGATAAAAGAAATTGTAAACGCAGCACATTGTCAAGCAAAGCAACCCAAAAAAATCGATTTTGAGTATAGTCAAAAAGGGTTACCCGTCAATGATGATCTGTTGCTGTTTGTTTGATACTCCTCCTTTCAATTTGCGATTGTTGTTGTTGCAATTGGCACGCGAA\n++WTSI_1055_1f24.q1kpIBR bases 86 to 670\n+!88BHIQQQYYYITTTTIIINNIIIIKKKYYYYIIIIFFYOMTTTYYIIIIAA99//.1<BKKOOTYYYYTTTTNNTTINNNTTYTTNNNIIITTYTTTTTTTTYYYYYIIIIIOYYYYYYYYYYYTTTTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTOTLLYYYYYYYYYTTTTTTTTTTTTTTTTYYYYYYYYYYTTTTTTYYTNNNNNTYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOKKKOOYYYYKK???KQMMMPPPPQMMKKKMPYYYKKKKKKKKKKMMYYYYYYYYYYYYYYYYYYYYYYYYYYYYYQQQQQI51)%%)4<QQQQQQYYYYTTKTTTTTTTYYYYYYYNNNNNNYYYKKKKGGNNNNYYYYYYYYYYYQMMMMQOKKGIIKKKKYQYYYYYYYYTOOLKKIIIIIOYQQQQQQBA>:;AABAACCCIIIOIIBBIIIII:77<><AAIIIOQQIE=>>>CA>AAABBIIIIIII:00882389667>BAAA?A>77:<844>A?;4++0966.+4492000--4922./..++\n+@WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+CAAATCCTACTGGCCGGACAAAAGAAGCGGCCAAACAACGTGCTCTTCACAAGACGATCACCACCAAAAACATTCACACATGCTCAACGAGACATTGCTTGCAGGATGGCAAGTGCAGGAAGCACTTTCCGGTGCATTAGTTTACACTGACTATGTAACCTATTGTTAATTCCCTGTAGAAACCGTTTGAGTACGACACTGTGTACTCTGAAAATGCCTACCCTCGCTACAAGCGCCGCCCACCTCCGCCTTCACTCCAAGAAGCCCAGCAGAGTCCGGAATTATACGGGCGCGAAATGCAATACAAGGACCAGCGTGGCAAACTAATTCGCAAGGACAACTCTCACGTCGTGGCTTTCAGTCCATTTCTGTCAAGCAAATATGTCGCTCAGTAAAATTAATACTTTTTGTGACAAAATTGCTAACTTTTTTGCAGCATTAACGTCGAGTTTGTCGCGGGAGAAGGATGTATAAAGTACTTATGCAAGTACATGATGAAAGGAGCGGACATGGCCTTTGTCCAAGTCACGGATGCCAACACGGGCCAAAGTGCGCTGAACTACGACGAACTGCAGCAAATTCG\n++WTSI_1055_1g01.p1kpIBF bases 1 to 584\n+!333;>HCDHHIIIYIIINTTYYYYTTTTTTYYYYYYNIIIIIININNTONB81+++04HQYTTTTTTTNIIINNTTNTTTTTTTTYYYTTTTTYTTTTTTYYYYYYYYYTTTTTTYYYYYTIIIIIITTTTTTTTTNNNNNNTNNTTTNNNNNNNNNNNNNNNNTTTTTYYTNNJJJJLYYYYYYYYYTTTTTTYTNNNNNNTYTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTNNNNNNTTYYYTNNNNNTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYYYYYYYYYYYYYYYYTNNNNNTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYTKKKTNNIIINTYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYTTTTTTOIICBBOQQQQQQC;<88:>>>CIFOYYYYYYQQQQQQQQQCCQQQQHCBAA:AAAAIIA>;A>AAAIC>>AAAACA>>>>III>::>AAACCCIIIA:;==<IIIIIQQAA<:::IA==::8::CQIIIIAA>>CI92\n+@WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+TATGACTGATTACGCCAGCTATTTAGGTGAGACTATAGAATACTCACGCTAGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGATTGCTTTTTGGCTCGCATACTGCAGCCTGGGGAAGTAGTTGACGTTTTGAAGAATTGAGGGAAGTTGACGTGAAACGGCAACGCGGAGCAGGTCGGAAATCGCTTCGCTATCAGAGCCAAGCAACGAAATGGCGATTGCGCTTAAAAAACATTGGTTTGCTTAAAACATCAATGGTCTTCACCGGTAGAAGCAGTCGCCTAGACCAACGTTGTTGACGCAACGAATGGTGTTTTGCTGCTGGGCAGACGTGGGCGGAGTGCTA\n++WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+!..+---77CBI>7---77>>>DACCCHHHIDDDDCCIHHAA84)))%%%))+,32>>HHHHCCCCCCCCCHIIIIINN<B.,,,+++2.22OBNDHHHHHIIDDDDIIYTNNNNNTTTIIIIIITTTTKKYYYYYYYYYYQOB84-,,.<>FIIIIINNNIIIKKMSSSIIIIIIIIIIIILTOOIIIIIFLLLLLLYYSKKLKKKPMSSYSYSSMSS?KKKKFFFIIFKKKKKKKKSMMMSKKIDDDKKKFDDFFFBBDD=DDMMMKDDDDDDKKFFCCKKKKKFFFKKKKFMMMMMKKKKKKKK734:4B<??B@DC=<871<1314/--,,+++++.-5:97--,\n+@WTSI_1055_1g02.p1kpIBF bases 1 to 523\n+AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACGACAAATTCACGGAAGCGTCTCGCACTTTGTGCCGAGGACTGCTGCACAAGGAGCCCACTCTGAGGTTGGGCTGTCGCCGGGTCGGCCGGCCTGAGGACGGCGCGGAAGAGCTGAAGGCACACGCGTTCTTCACACAACCGGACCAGAAGACAGGCAGGGAGCCAATTCCGTGGAGGAAGATGGAGGCCGGCAAGGTGGACGACATTCCCTTCTGAACTGCTAGAGAGGACTTGTAGGAATTCCGTCCTTCAGCTGACACCTCCATTTTGTCCGGACCCCCATTCGGTGTATGCCAAAGATGTGCTGGACATCGAGCAGTTCAGCACTGTCAAGGGAGTTCGTCCGCTTCCACCAAACTTTTCCTACCTGCTGAACCATTAGGTTCGACTTGACGCGACTGACAACTCCTTCTACGACAAGTTCAACAGCGGGTCCGTGTCCATACCTTGGC\n++WTSI_1055_1g02.p1kpIBF bases 1 to 523\n+!08<=AAA:28::87;<::>ACECEIIIIIIIIIIINIKBB>C>QQYNHHHHDDHDHIITIDCCCCOONNNNGDFDDINMINNNNNIHHHHHIINNIIINNNNTYTIIIIDDIIIIYYYTTTTTTYIIIDDDGGITYYSKKKIDNNNNTTNNNNNTYYYTLLLLLLLLLLLYYTYJJJJJNTTTTTTTTTTYYOLLLTTOOOTTTTTTTYNNNNNJJJLLLLLLYYYYYYYYYYSSYYONNNNNNLLTTTTTTTYYYYYYYYYYYYYYYYTMMKKKYYYYYYYYYYYYYTTTTTOOLIILLLLTTLNLLLLLLYYYYYYTTTLLLTTTTTTTYYYYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYNIIIIITYYTTTLTTNIIFFFMYYYYYYYOOLKKOOTIFIFIINTTTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNNNNTYYYYYYYYYYTTTNNNNNNNNTNIIFFFKYYOOOOOIIIA<:77:<<>>>>IOOIHHHDDEIQMMII<924595/4\n' |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/sanger-pairs-reverse.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/sanger-pairs-reverse.fastq Mon Jul 29 09:28:55 2013 -0400 |
b |
b"@@ -0,0 +1,288 @@\n+@WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+TGATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGGTACCCGACGTCCGATATCGCGAAAAATGATGTATCTAGATTTGTCAGGAAACGTCCCCGAGTCTGTTCGACAAACAAACGTTATTCCGAACTCCCAACAACAGTATTTGATTGTGTAAAAATCTCTTGGCCTGATTACTATACTTTAGACATTTTTAGTGCCTGTATTGGAGGTATTTTAGGAACTTTTGGAACGAGCTTTTATCGATTTAGGGAACTAAAAAACCGTTCCATATTCATTAGATGCTATTATTTAAAATCCGAGTCTGATTTGCGAT\n++WTSI_1055_1a04.q1kpIBR bases 1 to 359\n+!41>;D>AA>;;=;;>>AA@@CDDAA>>>ADINIIHHDD>::79:>>FIICCCHHHHCCCCCCCCCHHHHIEA>9..''))**,,++''+)**.,,,-,00..0B+..33010701+++-1B1.,??KMOYYQQQQ<<61,))01<:CAIIIIIYYYYTYTTTTYYYYYTTTTNNKKKKYYYYYYYYYYYYPMMOKTTTTYTTTTTYNINNINTNTIIIIIIIIINNYYYYYYYTTOLKKKIIIINNNOKKKKKFFKKYYYYYYYYYYSSMMMQMYYYYYTTTTLLPIDDDDDDFFFFFFMMKKLNIDFFKQQMMMMMMMMHHFF>A>>:779=5<488>>7745/00::300+++0-\n+@WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+CTGTGTACAAAGGGCAGGGACGTATTCAGAGCGAGTTGATGACTCGCCCCTACAAGGAATTCCTCGTTCACGGACAATAATTGCAATGTCCGATCCCAATCACGGCAAATTTTCACCGGTTTACCAACCCCTTTCGGGGAAGGACAAGCACGCTGATTTTGCCAGTGTAGCGCGCGTGCAGCCCCGGACATCTAAGGGCATCACAGACCTGTTATTGC\n++WTSI_1055_1a05.q1kpIBR bases 1 to 219\n+!>>>>>DDIFKOOTTTNDDDHHFTTOOKKKYYTTNNNIYYNNNNNNYTIIIIITIFNIDDKKKNNIIIFIITTTTNNNNNINIINGIKMYYYYYOTTTTTYKKLMMMYYYQOOAAAAIQ;7:<<<A>=AAQA>><<<>7::77::7>>IIIAAAA>:>A=>>5:88::=BIIIIIIIII>>7;9733999=8370---128999::14.,0,,0442+\n+@WTSI_1055_1a09.q1kpIBR bases 1 to 558\n+TGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCACCCAAAAAAAGTTTAAAAATTCGGAATGCGCTGTTTTCTTGGGTAAATATAAAGTAGGGTCCGGATTTATATTGTCTAAAACGCGAATTGACTTAAAAGATTGACCAAAAAAAGCCTAAAGTCCAAACTCTAATCAATAGAATAAAATGTTGGCAGAAATTTACGTCATGCAAAGGGTGTGCCAAATGGTTGATTTTGTGATTTTGATTTAATACAGAGGGTGCGAGATCAACTGAAATTTTGAGTAAATGCCGAGAGACTTTTTGTTTTTCAATTGTAATTTGAAGTTGGCCCTCTCTCCCCCCGACCGACAGTGGTACTCGGATAATCAGCCGAACAAACAAATATTCGTAGTGTTAAACAGAAGGGAAAGATGTAAGGTAACATTGGATTAGTTTGATGATGAGGCACTGAATTAAGGACAACTTGGTTATTATTATACATCCATGTGATTGTGAAGATTAAAGATGTTCTGGGACCAGGATGCCTTTGGAGAGGTTT\n++WTSI_1055_1a09.q1kpIBR bases 1 to 558\n+!=>>>>>>>DIIIHHDHB99-//66@DIHHHHHHHHHDDCCCCCDHHIIDID@D>C=@KKYYYYKKTIIIIIIYNNIFFFIIMTIDDDDDHHHHDDKFFFIIDDHHHDDDHHHINNINIYYIIONNNINLNNNNNTYYYYYYYTNLLLLLLOOYYYYYYYYYYYYYYYYTTTTTTTTTTTTNNLLLLLLTTNNNJJJNNTTTYYYMMLOOKYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYTTTTTTYYYYLTMTTTTNNNNLLTTTTTTTTTTLLTTTNNNJLLTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTYYTTTTTTYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTNNNNNNTONIIINNNNNNNKYYYIOINIIQMOOTNNNNNNNNNTTYYITIIIINNNNNIKKTTTTKTYYYYYYYYYYYYYLF@@@FBC>>=697038<<IIM88+++89I@QAI>::44--344;<><0056699:<9\n+@WTSI_1055_1a10.q1kpIBR bases 1 to 431\n+AAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCACGTAAAAATCGAAAACATAGAAAAAGAAGCAAAGACCGACCGAACCGGTGGGAGAAAGGCTGAATGGGGCATGATTGGGGGAGGGGGGGAAGGTGACGAACCGAACGAATAAATGACAGGACGAGTTTCTCTTTTCTTCTTGGGTTTACATGTGTTGCGTGACCTTCTGAAATGGGCATTCAATGGATGATTGGGAGGGGGGGGGGGGAAGAAGGCCGACCACAGGTTGAATTTCGACTTTCTTCTAATTTTGCCCAACTTTCCCGATGGGGAAGGGCCTATGACCATTCGGTTTTGCAATGAAATCTGCCAATTAAACATCGTCCTTTTTCGTATCTGTGATGGTATGTCGATGGGGTGCG\n++WTSI_1055_1a10.q1kpIBR bases 1 to 431\n+!9;75;;>;>>ACCCC@CCAADNNNNNIIF>>4::>>FFFDDDHHHHHHHDHDDDHHHHINHIIDD>42-55DFIIIIILYYKIIFIIINNYYYYYYYYYLTINNITTYYYYLONIIIILYYYTIIFIFFIMMSSSYSKKLKKOOTTTYYYYYYYSSSYYYLJJJJJTYYYYYYYLTTTTLYYYTTTMOLYYYYYYLLLNLIJIIIILLLYYTTOLKJJKKKTYYYSGGLLLLNLLKKFMJSSSMPMSSMMMSSYYYYSSMKKKKJJMMPSSMB>,,+++>9DDKKKF@@888F=?DFSK==19/99OFB11,,.,,/,.<E99,,,/9:?FB:0//002613../--,,,,.,,,,,-/0910/+-,0..,++..4+;+++4-,,,4./,//66B?54-,,.,,,,48+++2++,,+,,:6=1859/.,\n+@WTSI_1055_1a11.q1kpIBR bases 1 to 301\n+CGAAGGAAAGGCGGCGGAGAAAGTTTCGTCGTTGGCGGAAAAGCCGATGAAACGCGGGGGACGAACGAAGTTTGTGTTTTTTTTAAAAATCTTTTCTCGACGGTTTCCAGGGAATTGGCCAAGTCCATGGACAAAACCAATGCCAACGGTCCTTCGTCCGCTGATTCATCGACTTCGTGTCCCGGGAGCGCGGAGTGCCGCGGCATCCGCCTCAACAGAAAGGGCGTCAGCCGTCGTCCAACCATCACAGCGCCGTTCCCAAAAGCCGTGCCCCCTCGCGCAGTCGTTCGCCTCCACGGT\n++WTSI_1055_1a11.q1kpIBR bases 1 to 301\n+!DDDDEIOTNNNNNTFDDHHHITINNNNNNNNNNNNIITYTTNNIILOYYTTTTTYQKDFFFFKOLLFIIIINTTTTYYYYHHAADHSYYYYYYYYYYYYOTTTTNTTTNITTTTTTNNNLJLLNJJKJNLJTTYY"..b'DDDDDDDDDITIIHDAA==8??FFDHHIIFYYYYYYYYYYYYYNNIIIINNTTTYTTTYTOOLYYYIIIILLYYYYKKYYYYYYYONNNNNTYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYNTTTTLYYYYYYJLLJLLYYYYYYYTTTTNNTJNNTLLTTTTTJTTTTTTTTYYYYTTTTTTYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTOOYKKLIIIIIIYLMKKKOOTTTTTYYYTNIIIIITTTYYYYYYYYYYYYYTTTTTTYYYYYYYYYTTTKKKNIIITTYYYYYIIGIGB@=@@FFNNIIKKKMHFFQIIFDDDDDKMKTIIIIOOKIIIIIOOMCBAAAAAQABEHIEIAA::0++1569>>>6///-\n+@WTSI_1055_1f20.q1kpIBR bases 1 to 451\n+AGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCACGAAATGTGTGTGATATTTTAATGAATAAACTTCTTTTTTAATATCATATTAATAATTATTGTATCGTTTTACAACTTTCTATTCATATACTTTTCATCATCATCCCATCCGGTATCACTGCTCCTCCTCCTGCGCCCACCGGCCATCAGTCACTTTCGTGTCATTCCGTCGACAGTGTGGTGGTGGTAGTCAAAATTTGTTGACGGAAAGCCTCCAAAAATTGTTGAAATTGGCCAGCCGTGAGGCCCATTGCCATCCGCGGGTGGCATTTGAACTGTCCGCCCCAGTTTGGTGCCATGGCGGACGCCGCATTCGTCGCGTTGCCAGCCGATCCTCAGCAAAGCCGCTTGGCCCACCGCCGGTGGGCATGTGCCGTTGTCGA\n++WTSI_1055_1f20.q1kpIBR bases 1 to 451\n+!;>>>>>>>>>DDCC@CCDDDFFIINNNGEA=>@FFFFFHHHHHHHHHDDHDDDDDHHIIDFDDFDEDDKFIIIIINNIFFNNNIIIIIIYNTTTIINIIIIIOHHDDDDNIIIIIIIHHHHHHHHGDFIIFFINHLLLLLNNNNLNNNNNJNNNLLLLNLNNNNNTTTTTNNNNNNYLNNNNLNJJLNTTTYYYTTTMLOYYTTTTTLYYTTTNJNTTTYYYYTTTTTTYYYYYYYSSSONNNNNTYYYYYTTTTTTTTTTYYYYYYYYYYYYLTTOOLFFIOOOOOOOOYTTTTTTTKTTTTYYTNIIIIIITTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYPPPPPQOIGGGNNIIIIT?<5..8A82,+-..140011199>AAAA;;:<<A>>>@BAADDFDIKIIOIBBIIEII>:338:<II@B77/6-20;;IOA@;;91,\n+@WTSI_1055_1f21.q1kpIBR bases 1 to 336\n+GATTACGCCAAGCTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCAACAAGGATGCGTCTGCTTGTATAACCGGTAATCAAAAATGTGCAAATAATAAAATTGAGTGCATTTACAGGGAAACCGATCGTTGCTGGCGGTATTCATGGACGTGTTTCGGCCACGGGCCGTGGAATTTGGAAAGGGTTGGCGGTCTTCGTCAACGACAAGAACTACATGAGCAAATTGGGACTGACGACTGGATTTAAGGGGAAAACGTTCATCGTCCAAGGATTCGGTTTGTTTAGGGGAAAGGCATTGAAGGGG\n++WTSI_1055_1f21.q1kpIBR bases 1 to 336\n+!1>;CCCIFCCA>>>>A;>>ADDDDDDDDDDFIIINNNNNDDDDDDFFKIHHHHHHHHDDDDDDDDDDHFFINNKKPPPPOTNNNNIHHHDDDDDDHHHIIINIIIIITYYYYYYYYYYYTNNNNNTYYTTTTTTTTOLLIJJLLNTTYJNNNNNTTTTTNNNNNNYTTTTTNTLLKKYYOTTNNNNNNNTTYYSSPSSSSSSYYYOTOOOYYYYYYYYYTIIIIIITMOKIKNNNNIITNNOLKKMQKKOOTTYQQKKKKLKKKIINNHDFKOOOKKMQMMPPYYYTTTTTTYTTTTNNNNNNKFCCQQYYMMFF<<79?A8335:<:6-2+++\n+@WTSI_1055_1f22.q1kpIBR bases 1 to 496\n+CTATTTAGGTGAGACTATAGAATACTCAAGCTTGCATGCCTGCAGGTCGACTCTAGAGGATCCCGCATGAGGAATCGGAAGAGAATAATAACAAGAAAATGACAGATAAAAAGAGTGGAATTGAAGTAGAAGAGAAAAAGGGTAGAGTTGTAACAGAAGAGAAGAAAGTTTTAAATGAAGCGGAAGAAAAGAAGGACGAAGATCAGACGGAAGAGAAGAAAGAAAATGAAAAAGAAGTTAAAAGAAATAATGCGGAAGAGAAGAAGAAATTGGATGAAACTGAAGAGAAGCCGGATGAGGAAAGGGGAGAAAAGAAGAGCAGAGCTGAAGTGGAATTGGAAGAAACAACGAAGAAGAATAATGGACTTAAATATGTTTGGAAGCATCAAAATGAATCGGATGTAAAGAAGTACGAAAACATAATGGAAAGTATGGACGAAAAGAAAATGGAAGAGAAGGAGCTCGTGGACAATTACAGTAATATTTTGTTTGGAA\n++WTSI_1055_1f22.q1kpIBR bases 1 to 496\n+!399>>>>CHHHHBDDDEIIINNTIIFDA>AAAADDDDDDDDDHHHDDHDIIIIIINNNOOBB+++89DFIKKFFINNTTYYYTTTLLLKKKOOTTOLYLLOLTTTTTTTYYYYYYYYYYYYYTIIIDDDFFKOTYYYYYYYYYYYYYTTTLLJTTTYYYYYYYYYYYYTTTNJJLTTLLTTTTYYYYYYYYYTNNNNNTLLMKNNNNNNTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTLLKKKYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTNNNNJJLNNNNNNNNNTTTTTTNNNNNTYTNNNLNNNTTTTTNNLLTTTTTTTTYYYYYYYYYTTNLLLLLLNNNTLYYYYYYYYYYYYYYYTTTTTTYYYYYYYTNNNNNTTTNNNILOOTINNNNNTTTTMYMMMYIIINFFIIIGINIIIIKLLTOKKKMGGDFFFGFFFFFFFFFNNNIN?CCMQ<<3<<D<<+,.66>>F=;>:5.\n+@WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+TATGACTGATTACGCCAGCTATTTAGGTGAGACTATAGAATACTCACGCTAGCATGCCTGCAGGTCGACTCTAGAGGATCCCAGGATTGCTTTTTGGCTCGCATACTGCAGCCTGGGGAAGTAGTTGACGTTTTGAAGAATTGAGGGAAGTTGACGTGAAACGGCAACGCGGAGCAGGTCGGAAATCGCTTCGCTATCAGAGCCAAGCAACGAAATGGCGATTGCGCTTAAAAAACATTGGTTTGCTTAAAACATCAATGGTCTTCACCGGTAGAAGCAGTCGCCTAGACCAACGTTGTTGACGCAACGAATGGTGTTTTGCTGCTGGGCAGACGTGGGCGGAGTGCTA\n++WTSI_1055_1g01.q1kpIBR bases 1 to 350\n+!..+---77CBI>7---77>>>DACCCHHHIDDDDCCIHHAA84)))%%%))+,32>>HHHHCCCCCCCCCHIIIIINN<B.,,,+++2.22OBNDHHHHHIIDDDDIIYTNNNNNTTTIIIIIITTTTKKYYYYYYYYYYQOB84-,,.<>FIIIIINNNIIIKKMSSSIIIIIIIIIIIILTOOIIIIIFLLLLLLYYSKKLKKKPMSSYSYSSMSS?KKKKFFFIIFKKKKKKKKSMMMSKKIDDDKKKFDDFFFBBDD=DDMMMKDDDDDDKKFFCCKKKKKFFFKKKKFMMMMMKKKKKKKK734:4B<??B@DC=<871<1314/--,,+++++.-5:97--,\n' |
b |
diff -r 324775a016ce -r 6a14074bc810 test-data/sanger-pairs-singles.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/sanger-pairs-singles.fastq Mon Jul 29 09:28:55 2013 -0400 |
b |
b"@@ -0,0 +1,224 @@\n+@WTSI_1055_1a03.p1kpIBF bases 1 to 312\n+TTGTTGAACAGCAAAAAGGTCAAGAATATGGATGTTCTCGCCATGATTTTTGTGCCATAGGCGCGCATTCACAAGGTCCATCAGTCGNTCAGCCTGCCGCAACACCACCACCAGCCGCAGCAACAACAACAGCACCAGCAGCAGCTGATCCAATCGCATGTGCCACAGAATAACACCCAAAATCAATTAGCGACGGCCGCCCTCCAGCCGGTTCAGCAGCAGAAACAGCACGAAAAATGGGATCCGATCAAAGAATTTGGGCTGCAAAAGGACGAAATGGCGTTGAAGTCACCGCCCAGCAATGTTTGTGT\n++WTSI_1055_1a03.p1kpIBF bases 1 to 312\n+!96CBHOOTTTYYYQMK???OOTYTTTNNNYYYYNIIIFFIIIIIIIYOOOMAA62.((((*,9@MIIIIO?A3007OOOMMII::%%%::AEHIIIQYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTOOKKKKKYMMYYYKIINNNTYYNIIIINYYYYTOLKKKOOKKKKOLTTYYYYSSSSYYYYSSSSSSMMSOOTLLLONIDDDNOTTYQQMMMMPBB9>BDOOTTQMMMMQMMMQQE:666QQYYPMMDDDADDM@B<FDBBDKKKKKKKKIGKINIFFFKDGGIDB?2/\n+@WTSI_1055_1a07.p1kpIBF bases 1 to 574\n+AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACCGGTACGGAGGGAAATTTGATCATCGCGGAAGTGCTCGTTTTGATTATCTTGGTGTATGGCGTCTGTGACCTTCTTTTTCGCTGGATGGGCATCGGGGCGTACGCCTGGGGTTCGCGCTCGAGCCCCAAAATCGCCCTCACTTTCGATGACGGGCCCAGCGAACACACCCGGTCCTTGCTCGAGCTGCTGCACCGCCATGGGGTAAAAGCTACCTTTTTTGTCACCGGCGTTCAGGCCGAGCGGCACCCCGACTTGCTAGAAGCCCTGCGGGCCGATGGCCATCAGATCGAATCGCACGGCTACTGGCACCGCCAAGCGTTCTTCCTGTGGCCTTGGCAAGAAGCGCGGCACATCCAACGGGTTCCGGGCAAACTATACCGCCCCCCCTATGGAGCCCACTCCCCCTTCACCCGGCTTCTTGCCCGGCTCCACGGCAAAGTGGTGGCGCTATGTGACCTCGAGTCCAAGGACTGGACCGACCGACCTGCCGAAGAACTGGCCG\n++WTSI_1055_1a07.p1kpIBF bases 1 to 574\n+!>>>AAA:9.4441+35:88;;CHIIIIIIDDDCCCH>Q35-+*46?>CHHHHHHHHIIYOHHHHHTTYTHA72-35>:>DAKHHHQQTTNIIFIGNYNNNNIIIIIINTTYYFFFDDHIINTIIIIIITIIIIIIDDDDDDIIOTNTLIIIKLOYYYYYYYYTTTYYNNNNIIINNNNTIIINNLNIIINYYYYYYSSMMSYYTTMMKLLLNNTTTTTTTTLLKLLYTTJNLJLTYYYLLLLKLLTLLKKKLLTTTTTYYYYYYYTTTTTTTTOOKYLOTTTTYYYYYYTTTTTTYYYYYYYYYYTNNNNNTIFFFIFIIIIIOKKKYKKTOOKYYYYYYYTOOTIIOLKIINNNNNTYYYYYYYYYYYYYYYYYMTTTTTYTTTTIIOIIIQIIMIII:99>AAAIIBBIOOYYYYOKCCDAAFFFIOD@@>>>A<<926<QIQQQQMIIIIFDFFFDDFDDDAA===BGKKKKO943>>@B;BB?:?IMYYQB..+2,448:?88888<877:<>A810))*.12889600<<9411799>83,,,84337:<7227470..---.//+,\n+@WTSI_1055_1a08.p1kpIBF bases 1 to 397\n+TAAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACCTCCGAGAGCACTCGTGACGAATTGATTCCCCTGCTAAGCATCGAATGCGTAAAGTTAGGGCGTGCTCGTCGGCTTTATGAGAAGGGATTTCGCACCGTCGGATCGATAGCGAAAGCGGAGCCTCGCCAACTCATCGAAGCGTTAGGGGGCAAATTGAGCTGTTGCCAGTGCAGGAGGATGATCAGCAGTGCAAAGGTCCCCGGGGGTTGAGTAAATGGTGCTTAAAGGCCCCGTTCCGGAGACGATAAATTTATTCACTTTCAATTAGAGGCTTCAGAAGCTCAAAATTGTTCGAGTTTTTGTTCAGGCGATTATCCGCGATC\n++WTSI_1055_1a08.p1kpIBF bases 1 to 397\n+!.006<=AA83059:85;<::>CCECIIIIIIIIFIIINIBB1160BBKFDHHHHIIIIIIYOHHHHHTOID?:.-+,*,+.,/5.,*+06:IAA99,,,66??:,++002:0--,,170/442//.44<?33/74323/+****+28;=BBDDB<9...9<:32231644460.1.9/5055@@OB@9552B0492//../1@;99///BBFF11.9444///<BF@=666;@<@66140,,.03;;>>???M::2448HHKKMMMMMPYYOLKKKKYYYYYYYQQMHFKHMKLLKOOYYQMMKFKOOTYTDDDDDDQKKKKKKP?B<FFOIIDIOO?633:?AHII=:77:>IQQ?C?BOOO>=695BBNN1-,88553</..8888,,,425.\n+@WTSI_1055_1a15.p1kpIBF bases 1 to 312\n+GACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGATTTCGAGCTCGGTACCCGGGGATCCCACGAAAATGTAATTTTTTTTCTTTTAATTTTGTCAACTTTTTTAGCAAAAGCATTGTATTTTAACTGTATATTGCGTTTTGGAGGCAGTCACTGGATTCAAGGGAGCAGACCAAGAAAAATTTTTACAAAGTTTCTAACCCTTTCAAGGTTTTGGACCAATTTCGTAACAAATTTCGCCAAAAAATGTGCATAATTTCTTTTACCACGCCTATCGGCATCAGTAAGTCGTCCCAGTAAAGCTAATA\n++WTSI_1055_1a15.p1kpIBF bases 1 to 312\n+!:AA<4+1441+38::4..A<<BHHHCIIIICHI></4++*=:I>AHHFHDHDHIITIDDDDDOOOOMM@=30++,89QQQQOIIIIDDHHHHYTNNNNHHHIIOIIIIFFYYYYYYYNIIIIIIIIIIHHCC>81**'''(*6:IMMOQOIIIFFFIIILNNTTTTYYYYYYTNNNNIIKKKYTTTIIIMKTKTYTIDDDDDDTTNNIKKIIIIOOYFFFFFDIINNADDIIIKKKOOTLIOONHDKDDKKFFAD>AADMMMYOOOOLKKDIIIMKE966<<KB?>B70////2:B1../004.,,,..,\n+@WTSI_1055_1a17.p1kpIBF bases 1 to 201\n+AGAATGCGGAACAGCTGACGCAAATACATGTAGTCAGGCGCCTCGTCAAAGCGCGACCCGCGGCAGTAGTTCAAGTACATGNAGAACTCAAAGGGGAAGCCCTTGCACAGCATTTCCACCGGTGTCGACATTTTCTTAAAATGTTCACAAAAATGAACTTTTAATCGTAAAAAGGAGACCAATTTCGGGAACTTGTATGT\n++WTSI_1055_1a17.p1kpIBF bases 1 to 201\n+!<CIIIIIIIITHHIHHHNNNTNNNNIIIIIIIIIIITYOIIIIIOMMQ=6+(%(((,.<<QQIFIIFFIIIIIIHEB::%%%45BB64****4IQQQQOOOOOOYYYYYYYYYYYYYYTTTNNNTMOYYYYOTNNNNTTYYTTTTTTYYYMYYYMMYYOOOKKKKKCC???<::B9=BB"..b'TTTGACCCGGGGTCACAAGTCACCATGGTGAAAAGTTCCGTGGTACAAGCTCTAAGCCCGCAGAAAATAAAGGAAGGGCTATTGGAAGTGAGCGGCTTCCACAGCGAAATACCGGTGAAGCTGGACGCCCCACAATATAAACTACAAGTTGCCTTGGCAGACGGCCGTTCGGAGAACCTTGTAGCATATCGGGCCGATTGGATAGTCAGATCCATCCGCAGAGCTGAATGGAGCACCGGGAAAGTGGAAGCAGTTGACGATGAGCCGGATTTGCTCATTGGAATGCCAGAG\n++WTSI_1055_1f17.p1kpIBF bases 1 to 436\n+!08<;=<:404::4.25:9<>>ECIIIIIIDDDDDHIINIMKKKNNNHDDDDDDDDDIIYNIDDDDTDCCCAA;97699;IIITTTTTTNNNNNIIIIIIIIIIITYYYYTTTNNNTTTYYYYYYYYYYYYTTNNNNOO@8@BEIIITTTYYYYNNNNNTNNNNNNNLLLLLLLLNTYTTTTTTTNNNNNTTTYNNNNNTYYYYTTTTTTYYYYYYYYYTNNNJNOYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTTTTTTTTTTTYTTOMKLYYYYTTTTTIINNNNTYYYYYYTNIFFIIIIYYYYYYYYYYYYYYOOOIINKKKQQYYTTTTTTYYYYYOIDDDDDFTKMKGGINKKINNYKKKKMMMKKKKKKMIHH>>==:?;BAAQ=963;<<<<::;33,4./,591,,\n+@WTSI_1055_1f23.p1kpIBF bases 1 to 383\n+AAACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACCCCCTATCCCCGCAGAGGTCCATCCAGGAGTCCCAAGAGCACATGGAGAGCACTTTCAAGGCGTTGCGTCGTCAGCTGCCGGTGACGCGCTCCAAGCTGAACTGGCTGAACTTCCATTCCTTCCGCATCACTCAGCAGGAGATGAAGCAGCCGCCCTCGGCCGGCCAGCAACAACAGTCCCAGTGATGGAGCAGTCCAAGAAGAGGAAGCGAGCGAATTTGGAGCATCGCCCATTCATTTCAATTAATACCTTTCCGATTTGTGTACTTTCCCCGACATTTTCGCCATCCAATTATGGCAAGTGAAAGTTT\n++WTSI_1055_1f23.p1kpIBF bases 1 to 383\n+!34:<<<<<;289:87;<::>AACCEDIIIFDDHHHINNTYYYKKNNNIIIIHDDDDDIIYNDDDDDTTYYYFDDAAADFKYMIFFDDDDHDDFIFFIIDDHHHITTTYINNIIIKKKOMIHHDHHIYYYYLYINNNNNOKFFFDENNNNNHGGLLLNLNNNNNYNNNLNJTTTNNNJLINNNNTYYYYYYYYYYYYYYNNNNNNYYYTTTTTYTTTTTTYYYYNLIIIIIIIIIYYYYYTTTTTTYYYYYLIIIIIIILTYYYYTTOOKLYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOOIFFIYYYYYYYYYYYYYYYYYYYYYYYYYYYYKKIIIITOYYYQQOIEEAACC>>@=;>5>AAAAAAIB94\n+@WTSI_1055_1f24.q1kpIBR bases 86 to 670\n+TTGGCACGCAAAAGACGCAATTCTTCAGACGGATTTAAATTGGCAAGAATATCGAGCTAAATGGCAAATGTTTAAAATGGTAATCCCGGAGGAAGAAGACCACGGATTTTTTAACAAAAATGTAAATTTATTTCATGAATTTGTTGCAAAAACCAAAAGGTGCCAAAATATTGATTTACGAAAAGCGCTAACTTCTTCAGCCAAATGCCCTCTTCAAACCCACTTGATCAATCGTTGCACTCAGTGCTTTTTGATCGCCATTTTCTCCACGTCAGATTTAACCAGTCAATTTTGTCATTGGCTTCCTTTCAATGCGGTTGCTGCTTCAAAATCATCTCTTCCATTAAATTCGGGTAACGAGCCCAATGTTCTTGATGCTTCAACGAAAACTGATCAGGCGAACTGAAAGGGTGTAAAAAAGATAAAAGAAATTGTAAACGCAGCACATTGTCAAGCAAAGCAACCCAAAAAAATCGATTTTGAGTATAGTCAAAAAGGGTTACCCGTCAATGATGATCTGTTGCTGTTTGTTTGATACTCCTCCTTTCAATTTGCGATTGTTGTTGTTGCAATTGGCACGCGAA\n++WTSI_1055_1f24.q1kpIBR bases 86 to 670\n+!88BHIQQQYYYITTTTIIINNIIIIKKKYYYYIIIIFFYOMTTTYYIIIIAA99//.1<BKKOOTYYYYTTTTNNTTINNNTTYTTNNNIIITTYTTTTTTTTYYYYYIIIIIOYYYYYYYYYYYTTTTTTNNNNTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTOTLLYYYYYYYYYTTTTTTTTTTTTTTTTYYYYYYYYYYTTTTTTYYTNNNNNTYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYOKKKOOYYYYKK???KQMMMPPPPQMMKKKMPYYYKKKKKKKKKKMMYYYYYYYYYYYYYYYYYYYYYYYYYYYYYQQQQQI51)%%)4<QQQQQQYYYYTTKTTTTTTTYYYYYYYNNNNNNYYYKKKKGGNNNNYYYYYYYYYYYQMMMMQOKKGIIKKKKYQYYYYYYYYTOOLKKIIIIIOYQQQQQQBA>:;AABAACCCIIIOIIBBIIIII:77<><AAIIIOQQIE=>>>CA>AAABBIIIIIII:00882389667>BAAA?A>77:<844>A?;4++0966.+4492000--4922./..++\n+@WTSI_1055_1g02.p1kpIBF bases 1 to 523\n+AACGACGGCCAGTGAATTGTAATACGACTCACTATAGGGCGAATTCGAGCTCGGTACCCGGGGATCCCACGACAAATTCACGGAAGCGTCTCGCACTTTGTGCCGAGGACTGCTGCACAAGGAGCCCACTCTGAGGTTGGGCTGTCGCCGGGTCGGCCGGCCTGAGGACGGCGCGGAAGAGCTGAAGGCACACGCGTTCTTCACACAACCGGACCAGAAGACAGGCAGGGAGCCAATTCCGTGGAGGAAGATGGAGGCCGGCAAGGTGGACGACATTCCCTTCTGAACTGCTAGAGAGGACTTGTAGGAATTCCGTCCTTCAGCTGACACCTCCATTTTGTCCGGACCCCCATTCGGTGTATGCCAAAGATGTGCTGGACATCGAGCAGTTCAGCACTGTCAAGGGAGTTCGTCCGCTTCCACCAAACTTTTCCTACCTGCTGAACCATTAGGTTCGACTTGACGCGACTGACAACTCCTTCTACGACAAGTTCAACAGCGGGTCCGTGTCCATACCTTGGC\n++WTSI_1055_1g02.p1kpIBF bases 1 to 523\n+!08<=AAA:28::87;<::>ACECEIIIIIIIIIIINIKBB>C>QQYNHHHHDDHDHIITIDCCCCOONNNNGDFDDINMINNNNNIHHHHHIINNIIINNNNTYTIIIIDDIIIIYYYTTTTTTYIIIDDDGGITYYSKKKIDNNNNTTNNNNNTYYYTLLLLLLLLLLLYYTYJJJJJNTTTTTTTTTTYYOLLLTTOOOTTTTTTTYNNNNNJJJLLLLLLYYYYYYYYYYSSYYONNNNNNLLTTTTTTTYYYYYYYYYYYYYYYYTMMKKKYYYYYYYYYYYYYTTTTTOOLIILLLLTTLNLLLLLLYYYYYYTTTLLLTTTTTTTYYYYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYNIIIIITYYTTTLTTNIIFFFMYYYYYYYOOLKKOOTIFIFIINTTTTYYYYYYYYYYYYYYYYYYYYYYTNNNNNNNNTYYYYYYYYYYTTTNNNNNNNNTNIIFFFKYYOOOOOIIIA<:77:<<>>>>IOOIHHHDDEIQMMII<924595/4\n' |
b |
diff -r 324775a016ce -r 6a14074bc810 tools/fastq/fastq_paired_unpaired.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastq/fastq_paired_unpaired.py Mon Jul 29 09:28:55 2013 -0400 |
[ |
b'@@ -0,0 +1,241 @@\n+#!/usr/bin/env python\n+"""Divides a FASTQ into paired and single (orphan reads) as separate files.\n+\n+The input file should be a valid FASTQ file which has been sorted so that\n+any partner forward+reverse reads are consecutive. The output files all\n+preserve this sort order. Pairing are recognised based on standard name\n+suffices. See below or run the tool with no arguments for more details.\n+\n+Note that the FASTQ variant is unimportant (Sanger, Solexa, Illumina, or even\n+Color Space should all work equally well).\n+\n+This script is copyright 2010-2013 by Peter Cock, The James Hutton Institute\n+(formerly SCRI), Scotland, UK. All rights reserved.\n+\n+See accompanying text file for licence details (MIT license).\n+"""\n+import os\n+import sys\n+import re\n+from galaxy_utils.sequence.fastq import fastqReader, fastqWriter\n+\n+if "-v" in sys.argv or "--version" in sys.argv:\n+ print "Version 0.0.8"\n+ sys.exit(0)\n+\n+def stop_err(msg, err=1):\n+ sys.stderr.write(msg.rstrip() + "\\n")\n+ sys.exit(err)\n+\n+msg = """Expect either 3 or 4 arguments, all FASTQ filenames.\n+\n+If you want two output files, use four arguments:\n+ - FASTQ variant (e.g. sanger, solexa, illumina or cssanger)\n+ - Sorted input FASTQ filename,\n+ - Output paired FASTQ filename (forward then reverse interleaved),\n+ - Output singles FASTQ filename (orphan reads)\n+\n+If you want three output files, use five arguments:\n+ - FASTQ variant (e.g. sanger, solexa, illumina or cssanger)\n+ - Sorted input FASTQ filename,\n+ - Output forward paired FASTQ filename,\n+ - Output reverse paired FASTQ filename,\n+ - Output singles FASTQ filename (orphan reads)\n+\n+The input file should be a valid FASTQ file which has been sorted so that\n+any partner forward+reverse reads are consecutive. The output files all\n+preserve this sort order.\n+\n+Any reads where the forward/reverse naming suffix used is not recognised\n+are treated as orphan reads. The tool supports the /1 and /2 convention\n+originally used by Illumina, the .f and .r convention, and the Sanger\n+convention (see http://staden.sourceforge.net/manual/pregap4_unix_50.html\n+for details), and the new Illumina convention where the reads have the\n+same identifier with the fragment at the start of the description, e.g.\n+\n+@HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 1:N:0:TGNCCA\n+@HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 2:N:0:TGNCCA \n+\n+Note that this does support multiple forward and reverse reads per template\n+(which is quite common with Sanger sequencing), e.g. this which is sorted\n+alphabetically:\n+\n+WTSI_1055_4p17.p1kapIBF\n+WTSI_1055_4p17.p1kpIBF\n+WTSI_1055_4p17.q1kapIBR\n+WTSI_1055_4p17.q1kpIBR\n+\n+or this where the reads already come in pairs:\n+\n+WTSI_1055_4p17.p1kapIBF\n+WTSI_1055_4p17.q1kapIBR\n+WTSI_1055_4p17.p1kpIBF\n+WTSI_1055_4p17.q1kpIBR\n+\n+both become:\n+\n+WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR\n+WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR\n+"""\n+\n+if len(sys.argv) == 5:\n+ format, input_fastq, pairs_fastq, singles_fastq = sys.argv[1:]\n+elif len(sys.argv) == 6:\n+ pairs_fastq = None\n+ format, input_fastq, pairs_f_fastq, pairs_r_fastq, singles_fastq = sys.argv[1:]\n+else:\n+ stop_err(msg)\n+\n+format = format.replace("fastq", "").lower()\n+if not format:\n+ format="sanger" #safe default\n+elif format not in ["sanger","solexa","illumina","cssanger"]:\n+ stop_err("Unrecognised format %s" % format)\n+\n+def f_match(name):\n+ if name.endswith("/1") or name.endswith(".f"):\n+ return True\n+\n+#Cope with three widely used suffix naming convensions,\n+#Illumina: /1 or /2\n+#Forward/revered: .f or .r\n+#Sanger, e.g. .p1k and .q1k\n+#See http://staden.sourceforge.net/manual/pregap4_unix_50.html\n+re_f = re.compile(r"(/1|\\.f|\\.[sfp]\\d\\w*)$")\n+re_r = re.compile(r"(/2|\\.r|\\.[rq]\\d\\w*)$")\n+\n+#assert re_f.match("demo/1")\n+assert re_f.search("demo.f")\n+assert re_f.search("demo.s1")\n+assert re_f.search("demo.f1k")\n+assert re_f.search("demo.p1")\n+assert re_f.search("demo.p1k")\n+assert re_f.search("'..b'les = 0, 0, 0, 0, 0, 0\n+in_handle = open(input_fastq)\n+if pairs_fastq:\n+ pairs_f_writer = fastqWriter(open(pairs_fastq, "w"), format)\n+ pairs_r_writer = pairs_f_writer\n+else:\n+ pairs_f_writer = fastqWriter(open(pairs_f_fastq, "w"), format)\n+ pairs_r_writer = fastqWriter(open(pairs_r_fastq, "w"), format)\n+singles_writer = fastqWriter(open(singles_fastq, "w"), format)\n+last_template, buffered_reads = None, []\n+\n+for record in fastqReader(in_handle, format):\n+ count += 1\n+ name = record.identifier.split(None,1)[0]\n+ assert name[0]=="@", record.identifier #Quirk of the Galaxy parser\n+ is_forward = False\n+ suffix = re_f.search(name)\n+ if suffix:\n+ #============\n+ #Forward read\n+ #============\n+ template = name[:suffix.start()]\n+ is_forward = True\n+ elif re_illumina_f.match(record.identifier):\n+ template = name #No suffix\n+ is_forward = True\n+ if is_forward:\n+ #print name, "forward", template\n+ forward += 1\n+ if last_template == template:\n+ buffered_reads.append(record)\n+ else:\n+ #Any old buffered reads are orphans\n+ for old in buffered_reads:\n+ singles_writer.write(old)\n+ singles += 1\n+ #Save this read in buffer\n+ buffered_reads = [record]\n+ last_template = template\n+ else:\n+ is_reverse = False\n+ suffix = re_r.search(name)\n+ if suffix:\n+ #============\n+ #Reverse read\n+ #============\n+ template = name[:suffix.start()]\n+ is_reverse = True\n+ elif re_illumina_r.match(record.identifier):\n+ template = name #No suffix\n+ is_reverse = True\n+ if is_reverse:\n+ #print name, "reverse", template\n+ reverse += 1\n+ if last_template == template and buffered_reads:\n+ #We have a pair!\n+ #If there are multiple buffered forward reads, want to pick\n+ #the first one (although we could try and do something more\n+ #clever looking at the suffix to match them up...)\n+ old = buffered_reads.pop(0)\n+ pairs_f_writer.write(old)\n+ pairs_r_writer.write(record)\n+ pairs += 2\n+ else:\n+ #As this is a reverse read, this and any buffered read(s) are\n+ #all orphans\n+ for old in buffered_reads:\n+ singles_writer.write(old)\n+ singles += 1\n+ buffered_reads = []\n+ singles_writer.write(record)\n+ singles += 1\n+ last_template = None\n+ else:\n+ #===========================\n+ #Neither forward nor reverse\n+ #===========================\n+ singles_writer.write(record)\n+ singles += 1\n+ neither += 1\n+ for old in buffered_reads:\n+ singles_writer.write(old)\n+ singles += 1\n+ buffered_reads = []\n+ last_template = None\n+if last_template:\n+ #Left over singles...\n+ for old in buffered_reads:\n+ singles_writer.write(old)\n+ singles += 1\n+in_handle.close\n+singles_writer.close()\n+if pairs_fastq:\n+ pairs_f_writer.close()\n+ assert pairs_r_writer.file.closed\n+else:\n+ pairs_f_writer.close()\n+ pairs_r_writer.close()\n+\n+if neither:\n+ print "%i reads (%i forward, %i reverse, %i neither), %i in pairs, %i as singles" \\\n+ % (count, forward, reverse, neither, pairs, singles)\n+else:\n+ print "%i reads (%i forward, %i reverse), %i in pairs, %i as singles" \\\n+ % (count, forward, reverse, pairs, singles)\n+\n+assert count == pairs + singles == forward + reverse + neither, \\\n+ "%i vs %i+%i=%i vs %i+%i+%i=%i" \\\n+ % (count,pairs,singles,pairs+singles,forward,reverse,neither,forward+reverse+neither)\n' |
b |
diff -r 324775a016ce -r 6a14074bc810 tools/fastq/fastq_paired_unpaired.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastq/fastq_paired_unpaired.rst Mon Jul 29 09:28:55 2013 -0400 |
b |
@@ -0,0 +1,109 @@ +Galaxy tool to divide FASTQ files into paired and unpaired reads +================================================================ + +This tool is copyright 2010-2013 by Peter Cock, The James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +See the licence text below. + +This tool is a short Python script which divides a FASTQ file into paired +reads, and single or orphan reads. You can have separate files for the +forward/reverse reads, or have them interleaved in a single file. + +Note that the FASTQ variant is unimportant (Sanger, Solexa, Illumina, or even +Color Space should all work equally well). + +This tool is available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/peterjc/fastq_paired_unpaired + + +Automated Installation +====================== + +This should be straightforward, Galaxy should automatically download and install +the tool from the Galaxy Tool Shed, and run the unit tests + + +Manual Installation +=================== + +There are just two files to install: + +* fastq_paired_unpaired.py (the Python script) +* fastq_paired_unpaired.xml (the Galaxy tool definition) + +The suggested location is in the Galaxy folder tools/fastq next to other FASTQ +tools provided with Galaxy. + +You will also need to modify the tools_conf.xml file to tell Galaxy to offer +the tool. One suggested location is next to the fastq_filter.xml entry. Simply +add the line:: + + <tool file="fastq/fastq_paired_unpaired.xml" /> + +That's it. + + +History +======= + +======= ====================================================================== +Version Changes +------- ---------------------------------------------------------------------- +v0.0.1 - Initial version, using Biopython +v0.0.2 - Help text; cope with multiple pairs per template +v0.0.3 - Galaxy XML wrappers added +v0.0.4 - Use Galaxy library to handle FASTQ files (avoid Biopython dependency) +v0.0.5 - Handle Illumina 1.8 style pair names +v0.0.6 - Record script version when run from Galaxy + - Added unit test (FASTQ file using Sanger naming) +v0.0.7 - Link to Tool Shed added to help text and this documentation. +v0.0.8 - Use reStructuredText for this README file. + - Adopt standard MIT License. +======= ====================================================================== + + +Developers +========== + +This script and other tools for filtering FASTA, FASTQ and SFF files are +currently being developed on the following hg branch: +http://bitbucket.org/peterjc/galaxy-central/src/fasta_filter + +For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use +the following command from the Galaxy root folder:: + + $ tar -czf fastq_paired_unpaired.tar.gz tools/fastq/fastq_paired_unpaired.* test-data/sanger-pairs-*.fastq + +Check this worked:: + + $ tar -tzf fastq_paired_unpaired.tar.gz + tools/fastq/fastq_paired_unpaired.py + tools/fastq/fastq_paired_unpaired.rst + tools/fastq/fastq_paired_unpaired.xml + test-data/sanger-pairs-forward.fastq + test-data/sanger-pairs-interleaved.fastq + test-data/sanger-pairs-mixed.fastq + test-data/sanger-pairs-reverse.fastq + test-data/sanger-pairs-singles.fastq + + +Licence (MIT) +============= + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. |
b |
diff -r 324775a016ce -r 6a14074bc810 tools/fastq/fastq_paired_unpaired.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastq/fastq_paired_unpaired.xml Mon Jul 29 09:28:55 2013 -0400 |
[ |
@@ -0,0 +1,105 @@ +<tool id="fastq_paired_unpaired" name="Divide FASTQ file into paired and unpaired reads" version="0.0.7"> + <description>using the read name suffices</description> + <version_command interpreter="python">fastq_paired_unpaired.py --version</version_command> + <command interpreter="python"> +fastq_paired_unpaired.py $input_fastq.extension $input_fastq +#if $output_choice_cond.output_choice=="separate" + $output_forward $output_reverse +#elif $output_choice_cond.output_choice=="interleaved" + $output_paired +#end if +$output_singles + </command> + <stdio> + <!-- Anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + </stdio> + <inputs> + <param name="input_fastq" type="data" format="fastq" label="FASTQ file to divide into paired and unpaired reads"/> + <conditional name="output_choice_cond"> + <param name="output_choice" type="select" label="How to output paired reads?"> + <option value="separate">Separate (two FASTQ files, for the forward and reverse reads, in matching order).</option> + <option value="interleaved">Interleaved (one FASTQ file, alternating forward read then partner reverse read).</option> + </param> + <!-- Seems need these dummy entries here, compare this to indels/indel_sam2interval.xml --> + <when value="separate" /> + <when value="interleaved" /> + </conditional> + </inputs> + <outputs> + <data name="output_singles" format="input" label="Orphan or single reads"/> + <data name="output_forward" format="input" label="Forward paired reads"> + <filter>output_choice_cond["output_choice"] == "separate"</filter> + </data> + <data name="output_reverse" format="input" label="Reverse paired reads"> + <filter>output_choice_cond["output_choice"] == "separate"</filter> + </data> + <data name="output_paired" format="input" label="Interleaved paired reads"> + <filter>output_choice_cond["output_choice"] == "interleaved"</filter> + </data> + </outputs> + <tests> + <test> + <param name="input_fastq" value="sanger-pairs-mixed.fastq" ftype="fastq"/> + <param name="output_choice" value="separate"/> + <output name="output_singles" file="sanger-pairs-singles.fastq" ftype="fastq"/> + <output name="output_forward" file="sanger-pairs-forward.fastq" ftype="fastq"/> + <output name="output_reverse" file="sanger-pairs-reverse.fastq" ftype="fastq"/> + </test> + <test> + <param name="input_fastq" value="sanger-pairs-mixed.fastq" ftype="fastq"/> + <param name="output_choice" value="interleaved"/> + <output name="output_singles" file="sanger-pairs-singles.fastq" ftype="fastq"/> + <output name="output_paired" file="sanger-pairs-interleaved.fastq" ftype="fastq"/> + </test> + </tests> + <help> + +**What it does** + +Using the common read name suffix conventions, it divides a FASTQ file into +paired reads, and orphan or single reads. + +The input file should be a valid FASTQ file which has been sorted so that +any partner forward+reverse reads are consecutive. The output files all +preserve this sort order. Pairing are recognised based on standard name +suffices. See below or run the tool with no arguments for more details. + +Any reads where the forward/reverse naming suffix used is not recognised +are treated as orphan reads. The tool supports the /1 and /2 convention +originally used by Illumina, .f and .r convention, the Sanger convention +(see http://staden.sourceforge.net/manual/pregap4_unix_50.html for details), +and the current Illumina convention where the reads get the same identifier +with the fragment number in the description, for example: + + * @HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 1:N:0:TGNCCA + * @HWI-ST916:79:D04M5ACXX:1:1101:10000:100326 2:N:0:TGNCCA + +Note that this does support multiple forward and reverse reads per template +(which is quite common with Sanger sequencing), e.g. this which is sorted +alphabetically: + + * WTSI_1055_4p17.p1kapIBF + * WTSI_1055_4p17.p1kpIBF + * WTSI_1055_4p17.q1kapIBR + * WTSI_1055_4p17.q1kpIBR + +or this where the reads already come in pairs: + + * WTSI_1055_4p17.p1kapIBF + * WTSI_1055_4p17.q1kapIBR + * WTSI_1055_4p17.p1kpIBF + * WTSI_1055_4p17.q1kpIBR + +both become: + + * WTSI_1055_4p17.p1kapIBF paired with WTSI_1055_4p17.q1kapIBR + * WTSI_1055_4p17.p1kpIBF paired with WTSI_1055_4p17.q1kpIBR + +**Citation** + +This tool is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/fastq_paired_unpaired + </help> +</tool> |
b |
diff -r 324775a016ce -r 6a14074bc810 tools/filters/get_orfs_or_cdss.py --- a/tools/filters/get_orfs_or_cdss.py Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
@@ -1,223 +0,0 @@ -#!/usr/bin/env python -"""Find ORFs in a nucleotide sequence file. - -get_orfs_or_cdss.py $input_fasta $input_format $table $ftype $ends $mode $min_len $strand $out_nuc_file $out_prot_file - -Takes ten command line options, input sequence filename, format, genetic -code, CDS vs ORF, end type (open, closed), selection mode (all, top, one), -minimum length (in amino acids), strand (both, forward, reverse), output -nucleotide filename, and output protein filename. - -This tool is a short Python script which requires Biopython. If you use -this tool in scientific work leading to a publication, please cite the -Biopython application note: - -Cock et al 2009. Biopython: freely available Python tools for computational -molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. -http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. - -This script is copyright 2011-2013 by Peter Cock, The James Hutton Institute -(formerly SCRI), Dundee, UK. All rights reserved. - -See accompanying text file for licence details (MIT/BSD style). - -This is version 0.0.3 of the script. -""" -import sys -import re - -if "-v" in sys.argv or "--version" in sys.argv: - print "v0.0.3" - sys.exit(0) - -def stop_err(msg, err=1): - sys.stderr.write(msg.rstrip() + "\n") - sys.exit(err) - -try: - from Bio.Seq import Seq, reverse_complement, translate - from Bio.SeqRecord import SeqRecord - from Bio import SeqIO - from Bio.Data import CodonTable -except ImportError: - stop_err("Missing Biopython library") - -#Parse Command Line -try: - input_file, seq_format, table, ftype, ends, mode, min_len, strand, out_nuc_file, out_prot_file = sys.argv[1:] -except ValueError: - stop_err("Expected ten arguments, got %i:\n%s" % (len(sys.argv)-1, " ".join(sys.argv))) - -try: - table = int(table) -except ValueError: - stop_err("Expected integer for genetic code table, got %s" % table) - -try: - table_obj = CodonTable.ambiguous_generic_by_id[table] -except KeyError: - stop_err("Unknown codon table %i" % table) - -if ftype not in ["CDS", "ORF"]: - stop_err("Expected CDS or ORF, got %s" % ftype) - -if ends not in ["open", "closed"]: - stop_err("Expected open or closed for end treatment, got %s" % ends) - -try: - min_len = int(min_len) -except ValueError: - stop_err("Expected integer for min_len, got %s" % min_len) - -if seq_format.lower()=="sff": - seq_format = "sff-trim" -elif seq_format.lower()=="fasta": - seq_format = "fasta" -elif seq_format.lower().startswith("fastq"): - seq_format = "fastq" -else: - stop_err("Unsupported file type %r" % seq_format) - -print "Genetic code table %i" % table -print "Minimum length %i aa" % min_len -#print "Taking %s ORF(s) from %s strand(s)" % (mode, strand) - -starts = sorted(table_obj.start_codons) -assert "NNN" not in starts -re_starts = re.compile("|".join(starts)) - -stops = sorted(table_obj.stop_codons) -assert "NNN" not in stops -re_stops = re.compile("|".join(stops)) - -def start_chop_and_trans(s, strict=True): - """Returns offset, trimmed nuc, protein.""" - if strict: - assert s[-3:] in stops, s - assert len(s) % 3 == 0 - for match in re_starts.finditer(s): - #Must check the start is in frame - start = match.start() - if start % 3 == 0: - n = s[start:] - assert len(n) % 3 == 0, "%s is len %i" % (n, len(n)) - if strict: - t = translate(n, table, cds=True) - else: - #Use when missing stop codon, - t = "M" + translate(n[3:], table, to_stop=True) - return start, n, t - return None, None, None - -def break_up_frame(s): - """Returns offset, nuc, protein.""" - start = 0 - for match in re_stops.finditer(s): - index = match.start() + 3 - if index % 3 != 0: - continue - n = s[start:index] - if ftype=="CDS": - offset, n, t = start_chop_and_trans(n) - else: - offset = 0 - t = translate(n, table, to_stop=True) - if n and len(t) >= min_len: - yield start + offset, n, t - start = index - if ends == "open": - #No stop codon, Biopython's strict CDS translate will fail - n = s[start:] - #Ensure we have whole codons - #TODO - Try appending N instead? - #TODO - Do the next four lines more elegantly - if len(n) % 3: - n = n[:-1] - if len(n) % 3: - n = n[:-1] - if ftype=="CDS": - offset, n, t = start_chop_and_trans(n, strict=False) - else: - offset = 0 - t = translate(n, table, to_stop=True) - if n and len(t) >= min_len: - yield start + offset, n, t - - -def get_all_peptides(nuc_seq): - """Returns start, end, strand, nucleotides, protein. - - Co-ordinates are Python style zero-based. - """ - #TODO - Refactor to use a generator function (in start order) - #rather than making a list and sorting? - answer = [] - full_len = len(nuc_seq) - if strand != "reverse": - for frame in range(0,3): - for offset, n, t in break_up_frame(nuc_seq[frame:]): - start = frame + offset #zero based - answer.append((start, start + len(n), +1, n, t)) - if strand != "forward": - rc = reverse_complement(nuc_seq) - for frame in range(0,3) : - for offset, n, t in break_up_frame(rc[frame:]): - start = full_len - frame - offset #zero based - answer.append((start - len(n), start, -1, n ,t)) - answer.sort() - return answer - -def get_top_peptides(nuc_seq): - """Returns all peptides of max length.""" - values = list(get_all_peptides(nuc_seq)) - if not values: - raise StopIteration - max_len = max(len(x[-1]) for x in values) - for x in values: - if len(x[-1]) == max_len: - yield x - -def get_one_peptide(nuc_seq): - """Returns first (left most) peptide with max length.""" - values = list(get_top_peptides(nuc_seq)) - if not values: - raise StopIteration - yield values[0] - -if mode == "all": - get_peptides = get_all_peptides -elif mode == "top": - get_peptides = get_top_peptides -elif mode == "one": - get_peptides = get_one_peptide - -in_count = 0 -out_count = 0 -if out_nuc_file == "-": - out_nuc = sys.stdout -else: - out_nuc = open(out_nuc_file, "w") -if out_prot_file == "-": - out_prot = sys.stdout -else: - out_prot = open(out_prot_file, "w") -for record in SeqIO.parse(input_file, seq_format): - for i, (f_start, f_end, f_strand, n, t) in enumerate(get_peptides(str(record.seq).upper())): - out_count += 1 - if f_strand == +1: - loc = "%i..%i" % (f_start+1, f_end) - else: - loc = "complement(%i..%i)" % (f_start+1, f_end) - descr = "length %i aa, %i bp, from %s of %s" \ - % (len(t), len(n), loc, record.description) - r = SeqRecord(Seq(n), id = record.id + "|%s%i" % (ftype, i+1), name = "", description= descr) - t = SeqRecord(Seq(t), id = record.id + "|%s%i" % (ftype, i+1), name = "", description= descr) - SeqIO.write(r, out_nuc, "fasta") - SeqIO.write(t, out_prot, "fasta") - in_count += 1 -if out_nuc is not sys.stdout: - out_nuc.close() -if out_prot is not sys.stdout: - out_prot.close() - -print "Found %i %ss in %i sequences" % (out_count, ftype, in_count) |
b |
diff -r 324775a016ce -r 6a14074bc810 tools/filters/get_orfs_or_cdss.txt --- a/tools/filters/get_orfs_or_cdss.txt Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,93 +0,0 @@ -Galaxy tool to find ORFs or simple CDSs -======================================= - -This tool is copyright 2011-2013 by Peter Cock, The James Hutton Institute -(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. -See the licence text below. - -This tool is a short Python script (using Biopython library functions) -to search nucleotide sequences for open reading frames (ORFs) or coding -sequences (CDSs) where the first potential start codon is used. See the -help text in the XML file for more information. - -There are just two files to install: - -* get_orfs_or_cdss.py (the Python script) -* get_orfs_or_cdss.xml (the Galaxy tool definition) - -If you are installing this manually (rather than via the Tool Shed), the -suggested location is in the Galaxy folder tools/filters next to the tool -for calling sff_extract.py for converting SFF to FASTQ or FASTA + QUAL. -You will also need to modify the tools_conf.xml file to tell Galaxy to offer the -tool. One suggested location is in the filters section. Simply add the line: - -<tool file="filters/get_orfs_or_cdss.xml" /> - -You will also need to install Biopython 1.54 or later. If you want to run -the unit tests, include this line in tools_conf.xml.sample and the sample -FASTA files under the test-data directory. Then: - -./run_functional_tests.sh -id get_orfs_or_cdss - -That's it. - - -History -======= - -v0.0.1 - Initial version. -v0.0.2 - Correct labelling issue on reverse strand. - - Use the new <stdio> settings in the XML wrappers to catch errors -v0.0.3 - Include unit tests. - - Record Python script version when run from Galaxy. - - -Developers -========== - -This script and related tools are being developed on the following hg branch: -http://bitbucket.org/peterjc/galaxy-central/src/tools - -For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use -the following command from the Galaxy root folder: - -$ tar -czf get_orfs_or_cdss.tar.gz tools/filters/get_orfs_or_cdss.* test-data/get_orf_input*.fasta test-data/Ssuis.fasta - -Check this worked: - -$ tar -tzf get_orfs_or_cdss.tar.gz -filter/get_orfs_or_cdss.py -filter/get_orfs_or_cdss.txt -filter/get_orfs_or_cdss.xml -test-data/get_orf_input.fasta -test-data/get_orf_input.Suis_ORF.nuc.fasta -test-data/get_orf_input.Suis_ORF.prot.fasta -test-data/get_orf_input.t11_nuc_out.fasta -test-data/get_orf_input.t11_open_nuc_out.fasta -test-data/get_orf_input.t11_open_prot_out.fasta -test-data/get_orf_input.t11_prot_out.fasta -test-data/get_orf_input.t1_nuc_out.fasta -test-data/get_orf_input.t1_prot_out.fasta -test-data/Ssuis.fasta - - -Licence (MIT/BSD style) -======================= - -Permission to use, copy, modify, and distribute this software and its -documentation with or without modifications and for any purpose and -without fee is hereby granted, provided that any copyright notices -appear in all copies and that both those copyright notices and this -permission notice appear in supporting documentation, and that the -names of the contributors or copyright holders not be used in -advertising or publicity pertaining to distribution of the software -without specific prior permission. - -THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL -WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED -WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE -CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT -OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS -OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE -OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE -OR PERFORMANCE OF THIS SOFTWARE. |
b |
diff -r 324775a016ce -r 6a14074bc810 tools/filters/get_orfs_or_cdss.xml --- a/tools/filters/get_orfs_or_cdss.xml Tue Apr 23 11:48:43 2013 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,164 +0,0 @@\n-<tool id="get_orfs_or_cdss" name="Get open reading frames (ORFs) or coding sequences (CDSs)" version="0.0.3">\n-\t<description>e.g. to get peptides from ESTs</description>\n-\t<version_command interpreter="python">get_orfs_or_cdss.py --version</version_command>\n-\t<command interpreter="python">\n-get_orfs_or_cdss.py $input_file $input_file.ext $table $ftype $ends $mode $min_len $strand $out_nuc_file $out_prot_file\n-\t</command>\n-\t<stdio>\n-\t\t<!-- Anything other than zero is an error -->\n-\t\t<exit_code range="1:" />\n-\t\t<exit_code range=":-1" />\n-\t</stdio>\n-\t<inputs>\n-\t\t<param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file (nucleotides)" help="FASTA, FASTQ, or SFF format." />\n-\t\t<param name="table" type="select" label="Genetic code" help="Tables from the NCBI, these determine the start and stop codons">\n-\t\t\t<option value="1">1. Standard</option>\n-\t\t\t<option value="2">2. Vertebrate Mitochondrial</option>\n-\t\t\t<option value="3">3. Yeast Mitochondrial</option>\n-\t\t\t<option value="4">4. Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma</option>\n-\t\t\t<option value="5">5. Invertebrate Mitochondrial</option>\n-\t\t\t<option value="6">6. Ciliate Macronuclear and Dasycladacean</option>\n-\t\t\t<option value="9">9. Echinoderm Mitochondrial</option>\n-\t\t\t<option value="10">10. Euplotid Nuclear</option>\n-\t\t\t<option value="11">11. Bacterial</option>\n-\t\t\t<option value="12">12. Alternative Yeast Nuclear</option>\n-\t\t\t<option value="13">13. Ascidian Mitochondrial</option>\n-\t\t\t<option value="14">14. Flatworm Mitochondrial</option>\n-\t\t\t<option value="15">15. Blepharisma Macronuclear</option>\n-\t\t\t<option value="16">16. Chlorophycean Mitochondrial</option>\n-\t\t\t<option value="21">21. Trematode Mitochondrial</option>\n-\t\t\t<option value="22">22. Scenedesmus obliquus</option>\n-\t\t\t<option value="23">23. Thraustochytrium Mitochondrial</option>\n-\t\t</param>\n-\t\t<param name="ftype" type="select" value="True" label="Look for ORFs or CDSs">\n- <option value="ORF">Look for ORFs (check for stop codons only, ignore start codons)</option>\n- <option value="CDS">Look for CDSs (with start and stop codons)</option>\n-\t\t</param>\n- <param name="ends" type="select" value="open" label="Sequence end treatment">\n-\t\t\t<option value="open">Open ended (will allow missing start/stop codons at the ends)</option>\n- <option value="closed">Complete (will check for start/stop codons at the ends)</option>\n- <!-- TODO? Circular, for using this on finished bacteria etc -->\n- </param>\n-\n-\t\t<param name="mode" type="select" label="Selection criteria" help="Suppose a sequence has ORFs/CDSs of lengths 100, 102 and 102 -- which should be taken? These options would return 3, 2 or 1 ORF.">\n- <option value="all">All ORFs/CDSs from each sequence</option>\n- <option value="top">All ORFs/CDSs from each sequence with the maximum length</option>\n- <option value="one">First ORF/CDS from each sequence with the maximum length</option>\n-\t\t</param>\n- <param name="min_len" type="integer" size="5" value="30" label="Minimum length ORF/CDS (in amino acids, e.g. 30 aa = 90 bp plus any stop codon)">\n- </param>\n- <param name="strand" type="select" label="Strand to search" help="Use the forward only option if your sequence directionality is known (e.g. from poly-A tails, or strand specific RNA sequencing.">\n- <option value="both">Search both the forward and reverse strand</option>\n- <option value="forward">Only search the forward strand</option>\n- <option value="reverse">Only search the reverse strand</option>\n- </param>\n-\t</inputs>\n-\t<outputs>\n-\t\t<data name="out_nuc_file" format="fasta" label="${ftype.value}s (nucleotides)" />\n-\t\t<data name="out_prot_file" format="fasta" label="'..b'me="strand" value="forward" />\n-\t\t\t<output name="out_nuc_file" file="get_orf_input.t11_nuc_out.fasta" />\n-\t\t\t<output\tname="out_prot_file" file="get_orf_input.t11_prot_out.fasta" />\n-\t\t</test>\n-\t\t<test>\n- <param name="input_file" value="get_orf_input.fasta" />\n- <param name="table" value="11" />\n- <param name="ftype" value="CDS" />\n- <param name="ends" value="open" />\n- <param name="mode" value="all" />\n- <param name="min_len" value="10" />\n- <param name="strand" value="forward" />\n- <output name="out_nuc_file" file="get_orf_input.t11_open_nuc_out.fasta" />\n- <output name="out_prot_file" file="get_orf_input.t11_open_prot_out.fasta" />\n-\t\t</test>\n- <test>\n-\t\t\t<param name="input_file" value="Ssuis.fasta" />\n-\t\t\t<param name="table" value="11" />\n-\t\t\t<param name="ftype" value="ORF" />\n-\t\t\t<param name="ends" value="open" />\n-\t\t\t<param name="mode" value="all" />\n-\t\t\t<param name="min_len" value="100" />\n-\t\t\t<param name="strand" value="both" />\n-\t\t\t<output name="out_nuc_file" file="get_orf_input.Suis_ORF.nuc.fasta" />\n-\t\t\t<output name="out_prot_file" file="get_orf_input.Suis_ORF.prot.fasta" />\n-\t\t</test>\n-\t</tests>\n-\t<requirements>\n-\t\t<requirement type="python-module">Bio</requirement>\n-\t</requirements>\n-\t<help>\n-\n-**What it does**\n-\n-Takes an input file of nucleotide sequences (typically FASTA, but also FASTQ\n-and Standard Flowgram Format (SFF) are supported), and searches each sequence\n-for open reading frames (ORFs) or potential coding sequences (CDSs) of the\n-given minimum length. These are returned as FASTA files of nucleotides and\n-protein sequences.\n-\n-You can choose to have all the ORFs/CDSs above the minimum length for each\n-sequence (similar to the EMBOSS getorf tool), those with the longest length\n-equal, or the first ORF/CDS with the longest length (in the special case\n-where a sequence encodes two or more long ORFs/CDSs of the same length). The\n-last option is a reasonable choice when the input sequences represent EST or\n-mRNA sequences, where only one ORF/CDS is expected.\n-\n-Note that if no ORFs/CDSs in a sequence match the criteria, there will be no\n-output for that sequence.\n-\n-Also note that the ORFs/CDSs are assigned modified identifiers to distinguish\n-them from the original full length sequences, by appending a suffix.\n-\n-The start and stop codons are taken from the `NCBI Genetic Codes\n-<http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi>`_.\n-When searching for ORFs, the sequences will run from stop codon to stop\n-codon, and any start codons are ignored. When searching for CDSs, the first\n-potential start codon will be used, giving the longest possible CDS within\n-each ORF, and thus the longest possible protein sequence. This is useful\n-for things like BLAST or domain searching, but since this may not be the\n-correct start codon may not be appropriate for signal peptide detection\n-etc.\n-\n-**Example Usage**\n-\n-Given some EST sequences (Sanger capillary reads) assembled into unigenes,\n-or a transcriptome assembly from some RNA-Seq, each of your nucleotide\n-sequences should (barring sequencing, assembly errors, frame-shifts etc)\n-encode one protein as a single ORF/CDS, which you wish to extract (and\n-perhaps translate into amino acids).\n-\n-If your RNS-Seq data was strand specific, and assembled taking this into\n-account, you should only search for ORFs/CDSs on the forward strand.\n-\n-**Citation**\n-\n-This tool uses Biopython. If you use this tool in scientific work leading\n-to a publication, please cite the Biopython application note (and Galaxy\n-too of course):\n-\n-Cock et al 2009. Biopython: freely available Python tools for computational\n-molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.\n-http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.\n-\n-\t</help>\n-</tool>\n' |