Repository 'sample_seqs'
hg clone https://toolshed.g2.bx.psu.edu/repos/peterjc/sample_seqs

Changeset 0:3a807e5ea6c8 (2014-03-27)
Next changeset 1:16ecf25d521f (2014-03-27)
Commit message:
Uploaded v0.0.1
added:
test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff
test-data/MID4_GLZRM4E04_rnd30_frclip.sff
test-data/ecoli.fastq
test-data/ecoli.sample_N100.fastq
test-data/get_orf_input.Suis_ORF.prot.fasta
test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta
tools/sample_seqs/README.rst
tools/sample_seqs/sample_seqs.py
tools/sample_seqs/sample_seqs.xml
tools/sample_seqs/tool_dependencies.xml
b
diff -r 000000000000 -r 3a807e5ea6c8 test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff
b
Binary file test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff has changed
b
diff -r 000000000000 -r 3a807e5ea6c8 test-data/MID4_GLZRM4E04_rnd30_frclip.sff
b
Binary file test-data/MID4_GLZRM4E04_rnd30_frclip.sff has changed
b
diff -r 000000000000 -r 3a807e5ea6c8 test-data/ecoli.fastq
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/ecoli.fastq Thu Mar 27 09:40:53 2014 -0400
b
b"@@ -0,0 +1,20164 @@\n+@frag_1\n+AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTC\n++\n+##%')+.024JMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_1_a\n+GAGACATATTGCCCGTTGCAGTCAGAATGAAAAGCT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMJ420.+)'%##\n+@frag_2\n+AGAGACATATTGCCCGTTGCAGTCAGAATGAAAAGC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMJ420.+)'%#\n+@frag_3\n+CTTTTCATTCTGACTGCAACGGGCAATATGTCTCTG\n++\n+%')+.024JMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_4\n+ACAGAGACATATTGCCCGTTGCAGTCAGAATGAAAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMJ420.+)'\n+@frag_5\n+TTTCATTCTGACTGCAACGGGCAATATGTCTCTGTG\n++\n+)+.024JMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_6\n+ACACAGAGACATATTGCCCGTTGCAGTCAGAATGAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMJ420.+\n+@frag_7\n+TCATTCTGACTGCAACGGGCAATATGTCTCTGTGTG\n++\n+.024JMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_8\n+CCACACAGAGACATATTGCCCGTTGCAGTCAGAATG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMJ420\n+@frag_9\n+ATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGA\n++\n+24JMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_10\n+ATCCACACAGAGACATATTGCCCGTTGCAGTCAGAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMJ4\n+@frag_11\n+TCTGACTGCAACGGGCAATATGTCTCTGTGTGGATT\n++\n+JMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_12\n+TAATCCACACAGAGACATATTGCCCGTTGCAGTCAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_13\n+TGACTGCAACGGGCAATATGTCTCTGTGTGGATTAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_14\n+TTTAATCCACACAGAGACATATTGCCCGTTGCAGTC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_15\n+ACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_16\n+TTTTTAATCCACACAGAGACATATTGCCCGTTGCAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_17\n+TGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_18\n+TTTTTTTAATCCACACAGAGACATATTGCCCGTTGC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_19\n+CAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_20\n+TCTTTTTTTAATCCACACAGAGACATATTGCCCGTT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_21\n+ACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_22\n+ACTCTTTTTTTAATCCACACAGAGACATATTGCCCG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_23\n+GGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_24\n+ACACTCTTTTTTTAATCCACACAGAGACATATTGCC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_25\n+GCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_26\n+AGACACTCTTTTTTTAATCCACACAGAGACATATTG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_27\n+AATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_28\n+TCAGACACTCTTTTTTTAATCCACACAGAGACATAT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_29\n+TATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGAT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_30\n+TATCAGACACTCTTTTTTTAATCCACACAGAGACAT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_31\n+TGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_32\n+GCTATCAGACACTCTTTTTTTAATCCACACAGAGAC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_33\n+TCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_34\n+CTGCTATCAGACACTCTTTTTTTAATCCACACAGAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_35\n+TCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_36\n+AGCTGCTATCAGACACTCTTTTTTTAATCCACACAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_37\n+TGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_38\n+GAAGCTGCTATCAGACACTCTTTTTTTAATCCACAC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_39\n+TGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_40\n+CAGAAGCTGCTATCAGACACTCTTTTTTTAATCCAC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_41\n+TGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_42\n+TTCAGAAGCTGCTATCAGACACTCTTTTTTTAATCC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_43\n+GATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAAC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_44\n+AGTTCAGAAGCTGCTATCAGACACTCTTTTTTTAAT\n++\n+MMMMMMMMMMMMMMMMMMM"..b"4997\n+AATTGATGATGAATCATCAGTAAAATCTATTCATTA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_4998\n+ATAATGAATAGATTTTACTGATGATTCATCATCAAT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_4999\n+TTGATGATGAATCATCAGTAAAATCTATTCATTATC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5000\n+AGATAATGAATAGATTTTACTGATGATTCATCATCA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5001\n+GATGATGAATCATCAGTAAAATCTATTCATTATCTC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMK\n+@frag_5002\n+TGAGATAATGAATAGATTTTACTGATGATTCATCAT\n++\n+KKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5003\n+TGATGAATCATCAGTAAAATCTATTCATTATCTCAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMKKK\n+@frag_5004\n+ATTGAGATAATGAATAGATTTTACTGATGATTCATC\n++\n+KKKKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5005\n+ATGAATCATCAGTAAAATCTATTCATTATCTCAATA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMKKKK#\n+@frag_5006\n+CTATTGAGATAATGAATAGATTTTACTGATGATTCA\n++\n+##KKKKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5007\n+GAATCATCAGTAAAATCTATTCATTATCTCAATAGC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMKKKK##%\n+@frag_5008\n+AGCTATTGAGATAATGAATAGATTTTACTGATGATT\n++\n+'%##KKKKMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5009\n+ATCATCAGTAAAATCTATTCATTATCTCAATAGCTT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMKKKK##%')\n+@frag_5010\n+AAAGCTATTGAGATAATGAATAGATTTTACTGATGA\n++\n++)'%##KKKKMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5011\n+CATCAGTAAAATCTATTCATTATCTCAATAGCTTTT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMKKKK##%')+.\n+@frag_5012\n+GAAAAGCTATTGAGATAATGAATAGATTTTACTGAT\n++\n+0.+)'%##KKKKMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5013\n+TCAGTAAAATCTATTCATTATCTCAATAGCTTTTCA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMKKKK##%')+.02\n+@frag_5014\n+ATGAAAAGCTATTGAGATAATGAATAGATTTTACTG\n++\n+420.+)'%##KKKKMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5015\n+AGTAAAATCTATTCATTATCTCAATAGCTTTTCATT\n++\n+MMMMMMMMMMMMMMMMMMMMMKKKK##%')+.024J\n+@frag_5016\n+GAATGAAAAGCTATTGAGATAATGAATAGATTTTAC\n++\n+MJ420.+)'%##KKKKMMMMMMMMMMMMMMMMMMMM\n+@frag_5017\n+TAAAATCTATTCATTATCTCAATAGCTTTTCATTCT\n++\n+MMMMMMMMMMMMMMMMMMMKKKK##%')+.024JMM\n+@frag_5018\n+CAGAATGAAAAGCTATTGAGATAATGAATAGATTTT\n++\n+MMMJ420.+)'%##KKKKMMMMMMMMMMMMMMMMMM\n+@frag_5019\n+AAATCTATTCATTATCTCAATAGCTTTTCATTCTGA\n++\n+MMMMMMMMMMMMMMMMMKKKK##%')+.024JMMMM\n+@frag_5020\n+GTCAGAATGAAAAGCTATTGAGATAATGAATAGATT\n++\n+MMMMMJ420.+)'%##KKKKMMMMMMMMMMMMMMMM\n+@frag_5021\n+ATCTATTCATTATCTCAATAGCTTTTCATTCTGACT\n++\n+MMMMMMMMMMMMMMMKKKK##%')+.024JMMMMMM\n+@frag_5022\n+CAGTCAGAATGAAAAGCTATTGAGATAATGAATAGA\n++\n+MMMMMMMJ420.+)'%##KKKKMMMMMMMMMMMMMM\n+@frag_5023\n+CTATTCATTATCTCAATAGCTTTTCATTCTGACTGC\n++\n+MMMMMMMMMMMMMKKKK##%')+.024JMMMMMMMM\n+@frag_5024\n+TGCAGTCAGAATGAAAAGCTATTGAGATAATGAATA\n++\n+MMMMMMMMMJ420.+)'%##KKKKMMMMMMMMMMMM\n+@frag_5025\n+ATTCATTATCTCAATAGCTTTTCATTCTGACTGCAA\n++\n+MMMMMMMMMMMKKKK##%')+.024JMMMMMMMMMM\n+@frag_5026\n+GTTGCAGTCAGAATGAAAAGCTATTGAGATAATGAA\n++\n+MMMMMMMMMMMJ420.+)'%##KKKKMMMMMMMMMM\n+@frag_5027\n+TCATTATCTCAATAGCTTTTCATTCTGACTGCAACG\n++\n+MMMMMMMMMKKKK##%')+.024JMMMMMMMMMMMM\n+@frag_5028\n+CCGTTGCAGTCAGAATGAAAAGCTATTGAGATAATG\n++\n+MMMMMMMMMMMMMJ420.+)'%##KKKKMMMMMMMM\n+@frag_5029\n+ATTATCTCAATAGCTTTTCATTCTGACTGCAACGGG\n++\n+MMMMMMMKKKK##%')+.024JMMMMMMMMMMMMMM\n+@frag_5030\n+GCCCGTTGCAGTCAGAATGAAAAGCTATTGAGATAA\n++\n+MMMMMMMMMMMMMMMJ420.+)'%##KKKKMMMMMM\n+@frag_5031\n+TATCTCAATAGCTTTTCATTCTGACTGCAACGGGCA\n++\n+MMMMMKKKK##%')+.024JMMMMMMMMMMMMMMMM\n+@frag_5032\n+TTGCCCGTTGCAGTCAGAATGAAAAGCTATTGAGAT\n++\n+MMMMMMMMMMMMMMMMMJ420.+)'%##KKKKMMMM\n+@frag_5033\n+TCTCAATAGCTTTTCATTCTGACTGCAACGGGCAAT\n++\n+MMMKKKK##%')+.024JMMMMMMMMMMMMMMMMMM\n+@frag_5034\n+TATTGCCCGTTGCAGTCAGAATGAAAAGCTATTGAG\n++\n+MMMMMMMMMMMMMMMMMMMJ420.+)'%##KKKKMM\n+@frag_5035\n+TCAATAGCTTTTCATTCTGACTGCAACGGGCAATAT\n++\n+MKKKK##%')+.024JMMMMMMMMMMMMMMMMMMMM\n+@frag_5036\n+CATATTGCCCGTTGCAGTCAGAATGAAAAGCTATTG\n++\n+MMMMMMMMMMMMMMMMMMMMMJ420.+)'%##KKKK\n+@frag_5037\n+AATAGCTTTTCATTCTGACTGCAACGGGCAATATGT\n++\n+KKK##%')+.024JMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5038\n+GACATATTGCCCGTTGCAGTCAGAATGAAAAGCTAT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMJ420.+)'%##KK\n+@frag_5039\n+TAGCTTTTCATTCTGACTGCAACGGGCAATATGTCT\n++\n+K##%')+.024JMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5039_a\n+AGACATATTGCCCGTTGCAGTCAGAATGAAAAGCTA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMJ420.+)'%##K\n"
b
diff -r 000000000000 -r 3a807e5ea6c8 test-data/ecoli.sample_N100.fastq
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/ecoli.sample_N100.fastq Thu Mar 27 09:40:53 2014 -0400
b
@@ -0,0 +1,204 @@
+@frag_1
+AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTC
++
+##%')+.024JMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_100
+TAAAGTATTTAGTGACCTAAGTCAATAAAATTTTAA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_200
+TGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_300
+CATGGTTGTTACCTCGTTACCTTTGGTCGAAAAAAA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_400
+TGGCCACCTGCCCCTGCCTGGCATTGCTTTCCAGAA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_500
+TTCGGCATCGCTGATATTGGGTAAAGCATCCTGGCC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_600
+TTGGGCAAATTCCTGATCGACGAAAGTTTTCAATTG
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_700
+TAACGGCGATCGACATTTTCTCGCCACGGCAAATCA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_800
+ATATCGACGGTAGATTCGAGGTAATGCCCCACTGCC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_900
+CACCAGTTCGCCTTTTTCATTACCGGCGGTGAAACC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_1000
+TATAGACCCCGTCAACGTCCGTCCAAATCTCGCAAC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_1100
+GGGTGAAGAACTTTAGCGCCGAAGTAGGAAAGCTCC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_1200
+ATCACGGCTGGCACCAATGAGCGTACCTGGTGCTTG
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_1300
+GCGCCGCCATGCCGACCATCCCTTTCATCCCCGGAC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_1400
+CAGTCGCTTTGTGGAACGCAGAAACTGATGCTGTAT
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_1500
+CGAGATAATGGCCAGCCGTTCCGTCACTGCCAGCGG
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_1600
+ATCCCTGAGCAATGGCGACAATGTTGATATTGGCGC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_1700
+TCGATAACCTGATCGGTATTGAACAGCATCTGATGA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_1800
+GACACGTAAGTCGATATGTTTATTCTTCAGCCAGCT
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_1900
+GATTAAACGGCTCTTTGGCTTGCGCCAGTTCTTCCT
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_2000
+AAGTCGGCATATTGATCCGCCACTGCCTGGCTGGAA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_2100
+CCGCGATTTTTCCGCCGCATAACGCAACTGATGGTA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_2200
+CGGAGAACTTCATCAATTCATCACCTGCATTGAGCA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_2300
+TCGGTATAACCCATTTCCCGCGCCAGCGTGGTCGCC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_2400
+TTCAATATCCGCCAGCTCCAGTTCACGTCCCGTTTC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_2500
+CCACGCGCGCGGCAAAGAGATCGTCGAGTTGTGACA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_2600
+GGATCATTACCATCCACTTCGGCAATCTTCACGCGG
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_2700
+GTCATTGCCCGCACCATATCCGCGCAGTACCAACGG
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_2800
+ATTGGCACTGGAAGCCGGGGCATAAACTTTAACCAT
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_2900
+GACTGAATGTCTCTGCCGCCTCAACCGTGACTACAT
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_3000
+TGCTTACCCAGTTCCTGGCAAAAACGCTCCCAGCAC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_3100
+TTCATTCATCGCCATCAGCGCCGCGACCACCGAACA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_3200
+ACGGTGCCACGTTGTCGTAATGAATGCTGCCGGAGA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_3300
+CCCGGATACGCCAGCACCCACAGCCACTCATCAAAC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_3400
+GTGAATGAAGCCTGCCAGATGTCGCCCGTGCGCAAT
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_3500
+GCGCCTGCCGGAAGCCTGGCAGTAACCGTTCACGGT
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_3600
+ACGCGCTGGGCGGTTTCCGGCTTGTCACACAGAGCG
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_3700
+CATTTAGTTTTCCAGTACTCGTGCGCCCGCCGTATC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_3800
+GGTCGTGCGGAAAAAACAGCCCCTGATTTTTGCCCA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_3900
+GGGATTTCATCACCAATAAACGCCGAGAGGATCTTC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_4000
+CCCGTGGAACAATTCCAGACAACCGACATCGCTTTC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_4100
+CGGAGGTCGCGGTCAGAATGGTCACTGGCTTATCAC
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_4200
+TTTTCTTGCAGTGGACTGATTTTGCCTCGTGGATAG
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_4300
+TTCTTCATCATCAAACGCCTGCTTCACCAGCGCCTG
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_4400
+GCGGCAGCTGCGCAACAGCTTCAAAGTAGTAGCAAA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_4500
+CGTTTCACCGGCAGACCGAGTGACTTCGCCAGCAGA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_4600
+CATCGCGTTGGATAACGTCGCCTGAGTCGCTTTGGG
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_4700
+TTTCATCATCCACGGCTGCATAACCCAGCTCTTTCA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_4800
+CCTGGATTCAACTGATCACGCAGCGCACGATAAGCT
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_4900
+TGCCAGCTCTTTTGGCAGATCCAACGTTTCACCGAG
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
+@frag_5000
+AGATAATGAATAGATTTTACTGATGATTCATCATCA
++
+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
b
diff -r 000000000000 -r 3a807e5ea6c8 test-data/get_orf_input.Suis_ORF.prot.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.Suis_ORF.prot.fasta Thu Mar 27 09:40:53 2014 -0400
b
b'@@ -0,0 +1,16670 @@\n+>Streptococcus_suis|ORF1 length 457 aa, 1374 bp, from 1..1374 of Streptococcus_suis\n+MNQEQLFWQRFIELAKVNFKPSIYDFYVADAKLLGINQQVANIFLNRPFKKDFWEKNFEE\n+LMIAASFESYGEPLTIQYQFTEDEQEIRNTTNTRSSIVHQVQTLEPATPQETFKPVHSDI\n+KSQYTFANFVQGDNNHWAKAAALAVSDNLGELYNPLFIFGGPGLGKTHILNAIGNKVLAD\n+NPQARIKYVSSETFINEFLEHLRLNDMESFKKTYRNLDLLLIDDIQSLRNKATTQEEFFH\n+TFNALHEKNKQIVLTSDRNPDHLDNLEERLVTRFKWGLTSEITPPDFETRIAILRNKCEN\n+LPYNFTNETLSYLAGQFDSNVRDLEGALKDIHLIATMRQLSEISVEVAAEAIRSRKQTNP\n+QNMVIPIEKIQTEVGNFYGVSLKELKGSKRVQHIVHARQVAMFLAREMTDNSLPKIGKEF\n+GNRDHTTVMHAYNKIKTLLLDDENLEIEITSIKNKLR\n+>Streptococcus_suis|ORF2 length 385 aa, 1158 bp, from 1507..2664 of Streptococcus_suis\n+IINKGESMIQFSINKNIFLQALSITKRAISTKNAIPILSTVKITVTSEGITLTGSNGQIS\n+IEHFISIQDENAGLLISSPGSILLEAGFFINVVSSMPDLVLDFNEIEQKQIVLTSGKSEI\n+TLKGKEAEQYPRLQEVPTSKPLVLETKVLKQTINETAFAASTQESRPILTGVHFVLTENK\n+NLKTVATDSHRMSQRKLVLDTSGDDFNVVIPSRSLREFTAVFTDDIETVEVFFSNNQILF\n+RSEHISFYTRLLEGTYPDTDRLIPTEFKTTAIFDTANLRHSMERARLLSNATQNGTVKLE\n+IANNVVSAHVNSPEVGRVNEELDTVEVSGEDLVISFNPTYLIEALKATTSEQVKISFISS\n+VRPFTLIPNNEGEDFIQLVTPVRTN\n+>Streptococcus_suis|ORF3 length 104 aa, 315 bp, from complement(1707..2021) of Streptococcus_suis\n+TPVRIGRLSCVEAANAVSLIVCFNTLVSNTNGFEVGTSCKRGYCSASFPFNVISDLPLVK\n+TICFCSISLKSRTKSGILDTTLIKKPASKRMEPGELIKSPAFSS\n+>Streptococcus_suis|ORF4 length 293 aa, 882 bp, from 2756..3637 of Streptococcus_suis\n+MTLYILANPNAGSHTAEHIIFKIKESYPQLAVNIFMTVGPEDEKSQIEAILKEFVSSEDQ\n+LMILGGDGTLSKALRFWPASLPFAYYPTGSGNDFAKAMNITSLYRSVDAILERKTSRIYV\n+LNSSYGTVVNSMDFGFAAQVINGSTNSILKKILNKVKLGKLTYLFFGIKTLFSKQAINLE\n+LTLDEKSYQLDNLFFISVANSLYFGGGIMIWPTASAKKKEVDIVYFKNGNFYQRLQSLLA\n+LLTKRHESSHTIQHLTGVDVVLKSKEKLLLQIDGETCTANEVTLTYQERSMYL\n+>Streptococcus_suis|ORF5 length 126 aa, 381 bp, from 3933..4313 of Streptococcus_suis\n+KKEEEMIMKQLAQQIRVLRTAKNLSQDELAEKLYISRQAVSKWENGEATPDIDKLVQLAE\n+IFGVSLDYLVLGKEPEKEIVVEQRGKMNGWEFLNEESKRPLTRGDVVLLIFLAVMLLGGL\n+FIKHYF\n+>Streptococcus_suis|ORF6 length 377 aa, 1134 bp, from 4381..5514 of Streptococcus_suis\n+LESKKNMSLTAGIVGLPNVGKSTLFNAITKAGAEAANYPFATIDPNVGMVEVPDERLQKL\n+TELIIPKKTVPTTFEFTDIAGIVKGASKGEGLGNKFLANIREVDAIVHVVRAFDDENVMR\n+EQGREDAFVDPIADIDTINLELILADLESINKRYARVEKMARTQKDKDSVAEFAVLEKIK\n+PVLEDGKSARTVEFTDEEQKIVKQLFLLTTKPVLYVANVDEDKVADPEAISYVQQIRDFA\n+ATENAEVVVISARAEEEISELDDEDKGEFLEALGLTESGVDKLTRAAYHLLGLGTYFTAG\n+EKEVRAWTFKRGMKAPQCAGIIHSDFEKGFIRAVTMSYDDLMTYGSEKAVKEAGRLREEG\n+KEYVVQDGDIMEFRFNV\n+>Streptococcus_suis|ORF7 length 115 aa, 348 bp, from complement(4450..4797) of Streptococcus_suis\n+VNGINISDWIHKGIFTALFTHDIFIVKGTHNVDNRINFADIGQEFISKSFTFRSTFYDTS\n+NISKFKSRWHCLFRDDEFGQLLQTLIGHFYHADVWINSCERVVCSFCSCLGNCVK\n+>Streptococcus_suis|ORF8 length 115 aa, 348 bp, from complement(4491..4838) of Streptococcus_suis\n+RLLMLSRSAKINSRLMVSISAIGSTKASSRPCSRMTFSSSKARTTWTIASTSRILAKNLF\n+PSPSPLEAPFTIPAISVNSKVVGTVFLGMMSSVNFCRRSSGTSTMPTFGSIVAKG\n+>Streptococcus_suis|ORF9 length 192 aa, 579 bp, from 5663..6241 of Streptococcus_suis\n+GEKMTRLIIGLGNPGDRYFETKHNVGFMLLDKIAKRENVTFNHDKIFQADIATTFIDGEK\n+IYLVKPTTFMNESGKAVHALMTYYGLDATDILVAYDDLDMAVGKIRFRQKGSAGGHNGIK\n+SIVKHIGTQEFDRIKIGIGRPKGKMSVVNHVLSGFDIEDRIEIDLALDKLDKAVNVYLEE\n+DDFDTVMRKFNG\n+>Streptococcus_suis|ORF10 length 1166 aa, 3501 bp, from 6235..9735 of Streptococcus_suis\n+RIMNILDLLHKNKQINQWQSGLNQSTRQLLLGLSGTSKSLIMATAYDCLAEKIMIVTATQ\n+NDAEKLVADLTAIIGSENVYNFFTDDSPIAEFVFASKERTQSRIDSLNFLTDSTSSGILV\n+ASIVACRVLLPSPETYKGSKIQLEVGQEIEVDKLVKNLVNIGYKKVSRVLTQGEFSQRGD\n+ILDIFDMQSETPYRIEFFGDEIDGIRIFDVDSQKSLENLDEISISPASDIILSSEDYSRA\n+SQYIQTAIEQSTLEEQQSYLREVLADMQTEYRHPDLRKFLSCIYEQSWTLLDYLPKSSPL\n+FLDDFHKIADKQAQFEKEIADLLTDDLQKGKTVSSLKYFASTYAELRKYKPATFFSSFQK\n+GLGNVKFDALYQFTQHPMQEFFHQIPLLKDELTRYAKSNNTVVIQASSDVSLQTLQKNLQ\n+EYDIHLPVHAADKLVEGQQQVTIGQLASGFHLMDEKLVFITEKEIFNKKMKRKTRRTNIS\n+NAERIKDYSELAVGDYVVHHVHGIGQYLGIETIEISGIHRDYLTVQYQNSDRISIPVEQI\n+DLLSKYLASDGKAPKVNKLNDGRFQRTKQKVQKQVEDIADDLIKLYAERSQLKGFAFSPD\n+DENQVEFDNYFTHVETDDQLRSIDEIKKDMEKDSPMDRLLVGDVGFGKTEVAMRAAFKAV\n+NDGKQVAILVPTTVLAQQHYANFQERFAEFPVNVDVMSRFKTKAEQEKTLEKLKKGQVDI\n+LIGTHRLLSKDVVFADLGLLVIDEEQRFGVKHKERLKELKKKIDVLTLTATPIPRTLQMS\n+MLGIRDLSVIETPPTNRYP'..b'\n+DTDTVMYSIIALMTITYIVNRMMSGTQSSRNVMIISQKSEEIKDYITKVADRGVTELPII\n+GGFTGVDKRMLMTTISIPEMQKLETAVLEIDETAFMVVMPASQVRGRGFSLQKDHKHYDE\n+DILIPM\n+>Streptococcus_suis|ORF2902 length 565 aa, 1698 bp, from 1998923..2000620 of Streptococcus_suis\n+FQCNSLKIQVLSSTIKLIDRNRGETMLTVSDVSLRFSDRKLFDDVNIKFTAGNTYGLIGA\n+NGAGKSTFLKILAGDIEPSTGHISLGPDERLSVLRQNHFDYEDERVIDVVIMGNEQLYSI\n+MKEKDAIYMKEDFSDEDGVRAAELEGEFAELGGWEAESEASQLLQNLNISEDLHYQNMSE\n+LTNGEKVKVLLAKALFGKPDVLLLDEPTNGLDIQSINWLEDFLIDFENTVIVVSHDRHFL\n+NKVCTHMADLDFGKIKIFVGNYDFWKQSSELAAKLQADRNAKAEEKIKELQEFVARFSAN\n+ASKSKQATSRKKMLDKIELEEIIPSSRKYPFINFKSEREIGNDLLTVENLKVVIDGETIL\n+DNISFILRPGDKTALIGQNDIQTTALIRALMGDIEYEGTVKWGVTTSQSYLPKDNTRDFD\n+TNESILDWLRQFASKEEDDNTFLRGFLGRMLFSGDEVNKPVNVLSGGEKVRVMLSKLMLL\n+KSNVLVLDDPTNHLDLESISSLNDGLKAFKESIIFASHDHEFIQTLANHIIVISKNGVID\n+RIDETYDEFLENAEVQAKVQELWKA\n+>Streptococcus_suis|ORF2903 length 115 aa, 348 bp, from complement(1999705..2000052) of Streptococcus_suis\n+PIRAVLSPGRRIKLILSRIVSPSITTFKFSTVKRSLPISRSDLKLINGYLRLEGMISSNS\n+ILSNIFLREVACLDLEALAEKRATNSCSSLIFSSAFALRSACSLAASSLDCFQKS\n+>Streptococcus_suis|ORF2904 length 110 aa, 333 bp, from 1999974..2000306 of Streptococcus_suis\n+KLLLMVKRFLTISALSCAQVTRLLLLVKTTSKQLLSFVLLWAILNMKVLSSGVSLLVNPT\n+YQKTILVTLIQTNLSLIGSVNLPARKKMTIPSCAVSWDVCSSRVMRLTNL\n+>Streptococcus_suis|ORF2905 length 117 aa, 354 bp, from 2000502..2000855 of Streptococcus_suis\n+QTISSSFLKTVLSTESTKLMMNSWKMLKYKQKYKNFGKHNKKRLGLLPSLSSQSSCQHLS\n+AVVDCQICSCFTLQIWPLRLLRTKFALSPTSNCLPDSLSCAGVGVKQSGNRLFQLNN\n+>Streptococcus_suis|ORF2906 length 872 aa, 2619 bp, from 2000888..2003506 of Streptococcus_suis\n+PVKFFPTSFSFKSMKKIFTKTSIYYLLSFLIPLTIISIVLAFQGIWWGSDTTILASDGFH\n+QYVIFNQTLRNTLHGDGSLFYTFSSGLGLNFYALSSYYLGSFLSPIVFFFDLQSMPDAIY\n+LVTIVKFGLTGLSTYFSLKGIHKNLKEEWALLLATSFSLMSFSTSQLEINNWLDVFILLP\n+LVLLGLHRLLKKQGPILYYITLTCLFIQNYYFGYMVAIFLTLWTLVQLSWIDSQRIKRFI\n+NFTIVSILSALSSMFMLLPTYLDLKTHGETFTKIVNLKTEDSWYLDFFAKNLVGSFDTTK\n+FGSIPMISVGLVPLILALLFFTLKEIKPTVKLSYALFFTFIISSFYLQPLNLFWQGMHAP\n+NMFLYRYAWALSITVIYLAAETLVRLRQVSIKNFTLIVSFLLICFTSTFIFRDHYEFLTD\n+VNFLLTLEFLIAYFILFVAMIRYKSSLKWINIVLLFFTFLELGLHSHYQVQGISDEWHFP\n+SRSNYEEKLTDIDSIVKSTKTTTDSFYRIERLLPQTGNDSMKFNYNGISQFSSIRNRASS\n+SVLDKLGFRSDGTNLNLRYQNNTIIADSLFGVKYNLATTDPNKFGFTLNQSQSTINLYEN\n+SFNLGLALLTEGIYKDVNFTNLTLDNQTNFLNQLTGLSQKYYHTLSDVVSQNTVELSNRM\n+TVNKVDNEDAAKATFLVNIPANSQVYLNLPNLTFSNENQKKVVITVNNQSSEFTLDNAFS\n+FFNVGSFTTDVQVQVNVYFPENNQVSFDKPQFYRLDLLAFQQAISILQEKQVVTKTDGNK\n+VTVDFVTDKESSLLLTLPYDKGWNATIDGKPIKIQKAQDGFMKVDVSPGQTKLVLTFVPN\n+GFYLGLLISFGAVFVFFSYQFIGYYYSKNREY\n+>Streptococcus_suis|ORF2907 length 235 aa, 708 bp, from complement(2003907..2004614) of Streptococcus_suis\n+FHVKQGVKMNQKEYRVFEGLRIACSLTFISGYLNAFTFVTQGGRFAGVQSGNVISLAYFL\n+AKGDFAQVVNFSIPILFFVFGQFFTYLARRYFEKQTWSWHFGSSVMMLVLILLTIILSPI\n+MPASFTIASLAFVASIQVETFRRLRGAPYANVMMTGNVKNAAYLWFKGVIEKDSELRKTG\n+RNILLTIIGFMLGVIISTHLSFQFEEYALIGLILPVLYINYELWQEKRPTRGRSK\n+>Streptococcus_suis|ORF2908 length 180 aa, 543 bp, from complement(2004615..2005157) of Streptococcus_suis\n+PYPDFLKIFSVVCLWICYNYFMKIKLITVGKLKEKYLKEGIAEYSKRLGRFTKLDMIELP\n+DEKTPDKASQAENEQILKKEADRIMSKIGERDFVIALAIEGKQFPSEEFSQRISDIAVNG\n+YSDITFIIGGSLGLDSCIKKRANLLMSFGQLTLPHQLMKLVLIEQIYRAFMIQQGSPYHK\n+>Streptococcus_suis|ORF2909 length 413 aa, 1242 bp, from 2005223..2006464 of Streptococcus_suis\n+VIIKKEIVLLRKIKEMERIPYMKKYLKFAILFVIGFFGGLIGALSASFFQPQVQQANSAI\n+TSVSNVQYNNETSTTKAVEKVQNAVVSVINYQKSANNSLGVIFGNIESSDELAVAGEGSG\n+VIYKKYGQYAYIVTNTHVINNAEKIDILLASGEKISGELVGSDTYSDIAVIKISADKVTA\n+VAEFADSDTIKVGETAIAIGSPLGSVYANTVTQGIISSLSRTVTSQSKDGQTISTNAIQT\n+DTAINPGNSGGPLINTQGQVIGITSSKITSSSANSSGVAVEGLGFAIPANDAVAIINQLE\n+KTGQVSRPALGVHMVNLTTLSTSQLEKAGLSNTELTSGVVIVSTQSGLPADGKLETFDVI\n+TEIDGEAIQNKSDLQSALYKHQIGDTITVTYYRNNQKQTVDIKLTHSTEELSE\n+>Streptococcus_suis|ORF2910 length 256 aa, 771 bp, from 2006519..2007289 of Streptococcus_suis\n+GYMEELRTLNISEIHPNPYQPRIHFDEKELLELAQSIKENGLIQPIIVRKSSIIGYELLA\n+GERRLRASQLAGLTTIPAVVKELTDDDLLYQAIIENLQRSNLNPIEEAASYQKLISRGLT\n+HDEVAQIMGKSRPYISNLLRLLNLSSQTKQAVEEGKISQGHARQLVSFSEEKQAEWVQLI\n+LSKDLSVRTLEKLIAANKKKHTKLKQRDQFLKEQEDSLSKTLGTATKIIKKKNGSGEIRI\n+SFNDLDEFERIINNFK\n'
b
diff -r 000000000000 -r 3a807e5ea6c8 test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta Thu Mar 27 09:40:53 2014 -0400
b
b'@@ -0,0 +1,194 @@\n+>Streptococcus_suis|ORF1 length 457 aa, 1374 bp, from 1..1374 of Streptococcus_suis\n+MNQEQLFWQRFIELAKVNFKPSIYDFYVADAKLLGINQQVANIFLNRPFKKDFWEKNFEE\n+LMIAASFESYGEPLTIQYQFTEDEQEIRNTTNTRSSIVHQVQTLEPATPQETFKPVHSDI\n+KSQYTFANFVQGDNNHWAKAAALAVSDNLGELYNPLFIFGGPGLGKTHILNAIGNKVLAD\n+NPQARIKYVSSETFINEFLEHLRLNDMESFKKTYRNLDLLLIDDIQSLRNKATTQEEFFH\n+TFNALHEKNKQIVLTSDRNPDHLDNLEERLVTRFKWGLTSEITPPDFETRIAILRNKCEN\n+LPYNFTNETLSYLAGQFDSNVRDLEGALKDIHLIATMRQLSEISVEVAAEAIRSRKQTNP\n+QNMVIPIEKIQTEVGNFYGVSLKELKGSKRVQHIVHARQVAMFLAREMTDNSLPKIGKEF\n+GNRDHTTVMHAYNKIKTLLLDDENLEIEITSIKNKLR\n+>Streptococcus_suis|ORF101 length 112 aa, 339 bp, from complement(72006..72344) of Streptococcus_suis\n+LQFNFYVYTTWKIQFHQSVNCFLSWVDDVDQTFVCTHFELLTRIFVLVSRTDDCVEATFC\n+WKWNWTCYLSTCTCCSFNDFCSCCIKCTVFVRFQADANFFVCHLFLLFVYLT\n+>Streptococcus_suis|ORF201 length 360 aa, 1083 bp, from complement(128035..129117) of Streptococcus_suis\n+SCHGGRRMTLFGKIKEVTELQSLPGFEGQVRNHIRQKITPHVDRIETDGLGGIFGIKDTA\n+VENAPRILVVAHMDEVGFMISQIKPDGTFRVVELGGWNPLVVSSQAFTLQLQDGRTIPAI\n+SGSVPPHLSRGANAPGMPAIADIIFDAGFANYDEAWAFGVRPGDVLVPKNETILTANGKN\n+VISKAWDNRFGVLMVTELLESLSGHALPNQLIAGANVQEEVGLRGAHASTTKFNPDIFLA\n+VDCSPAGDIYGDQGKIGDGTLLRFYDPGHIMLKNMKDFLLTTAEEAGVKFQYYCGKGGTD\n+AGAAHLKNHGVPSTTIGVCARYIHSHQTLYSMDDFLEAQAFLQTIVKKLDRSTVDLIKNY\n+>Streptococcus_suis|ORF301 length 105 aa, 318 bp, from complement(191714..192031) of Streptococcus_suis\n+NQGQTVRFFHIRGNFCQKFIIGNTCGSCQMQFIPNIVLDKLGNLNRRADAQLIFCYIQVS\n+LIDRHGLHQISIAMENFSNLSPNFLIFFIIARNENRLRTTLISLF\n+>Streptococcus_suis|ORF401 length 120 aa, 363 bp, from 265643..266005 of Streptococcus_suis\n+TTGTTSPIAPKWKASSKSLRVPTSEPTTLIPSSTVFTILRSMYSDGSPTATTYPPARTLS\n+IAWLKATLETAVTTVECTPPPVISLIYPGTSSTSSPLIVTSAPTSLASSNLSLLMSTAIT\n+>Streptococcus_suis|ORF501 length 104 aa, 315 bp, from 336857..337171 of Streptococcus_suis\n+KDLIRKSSLLVYKLFSPSRTMSKSIVAWRMIYSDEDPIRRRSKAFRPVAPSNIRSAPVSF\n+GISIASRVLGLPNSMRICTSVRPALRASCLYSSRRSSASLSKDS\n+>Streptococcus_suis|ORF601 length 665 aa, 1998 bp, from 409896..411893 of Streptococcus_suis\n+VMIQIGKIFAGRYRIVRQIGRGGMADVYLARDLILDGEEVAVKVLRTNYQTDQIAIQRFQ\n+REARAMAELDHPNIVRISDIGEEDGQQYLAMEYVNGLDLKRYIKENAPLSNDVAVRIMGQ\n+ILLAMRMAHTRGIVHRDLKPQNVLLTSNGVAKVTDFGIAVAFAETSLTQTNSMLGSVHYL\n+SPEQARGSKATIQSDIYAMGIILFEMLTGRIPYDGDSAVTIALQHFQKPLPSVREENANV\n+PQALENVVLKATAKKLNERYKSVAEMYADLASALSMDRQNEPRVELEGNKVDTKTLPKLS\n+QANVETKVPHTNSSAQVSATDKGSGKKEVAKSGNKPVSKPRPGIRTRYKVLIGAILLTVI\n+AAGLMFFNTPRTVTVPDVSGQTVEKATEMIEVAGLEVGNITEEATATVDEGLVIRTSPAA\n+KTTRRQGSKIDIVVATAALASIPDVVDKESDTARQELEALGFQVTIKEEYSEKVAQGLVI\n+KTDPGANSSAEKGAKITLYVSKGVAPQVVPNVVGKSQENATQILQTAGFSIGTITQEYSS\n+SVTAGQVISTDPVANTELAKGSIINLVISKGKELIMPDLTSGNYTYSQARSQLQALGVNA\n+ESIEKQEDRSYYSTTSDIVIGQYPAAGATIDGTVTLYVSVASTRTSSDSSAGSSTSTSTS\n+TGSGQ\n+>Streptococcus_suis|ORF701 length 182 aa, 549 bp, from 489284..489832 of Streptococcus_suis\n+LGKSVALSSLSRVTIYRGENLGKLRFFVAYLTSRYIIDIQNEGRMNMNIKEVSDVTGLSA\n+DTIRYYERVGLIPKIARKSSGVRDFVENDVAVLEFVRCFRSAGMSIERLIEYMGLVQAGD\n+STVEARIDLLKEEREVLQSRLSKIQQALERLDYKIENYQTILRGAENQLFTDGSGSCKKD\n+RE\n+>Streptococcus_suis|ORF801 length 428 aa, 1287 bp, from 561960..563246 of Streptococcus_suis\n+KSSRDCESCLLLFVILKVMQADRRKTFGKMRIRINNLFFVAIAFMGIIISNSQVVLAIGK\n+ASVIQYLSYLVLILCIVNDLLKNNKHIVVYKLGYLFLIIFLFTIGICQQILPITTKIYLS\n+ISMMIISVLATLPISLIKDIDDFRRISNHLLFALFITSILGIMMGATMFTGAVEGIGFSQ\n+GFNGGLTHKNFFGITILMGFVLTYLAYKYGSYKRTDRFILGLELFLILISNTRSVYLILL\n+LFLFLVNLDKIKIEQRQWSTLKYISMLFCAIFLYYFFGFLITHSDSYAHRVNGLINFFEY\n+YRNDWFHLMFGAADLAYGDLTLDYAIRVRRVLGWNGTLEMPLLSIMLKNGFIGLVGYGIV\n+LYKLYRNVRILKTDNIKTIGKSVFIIVVLSATVENYIVNLSFVFMPICFCLLNSISTMES\n+TINKQLQT\n+>Streptococcus_suis|ORF901 length 241 aa, 726 bp, from 628396..629121 of Streptococcus_suis\n+NSPMRLDKCLEKAKVGSRKQVKKLFKAQQIKINGQAAQSLSQIVDPELQTIQVSGKKVAL\n+EGSAYYLLHKPAGVVSAVTDQEHQTVIDLISPQDSREGLYPVGRLDRDTEGLVLITNNGP\n+LGYRMLHPSHHVDKVYYVEVNGCLAEDASKFFASGVTFLDGTRCQPADLTVLEASLDHSR\n+ATIKLAEGKFHQVKKMFLAYGVKVTYLKRISFGGFELGDLERGTYRQLTPNEMEHLFTYF\n+D\n+>Streptococcus_suis|ORF1001 length 374 aa, 1125 bp, from 694014..695138 of Streptococcus_suis\n+HYLLFQGGILMKVFASPSRYIQGKHVLFQGAEAIGKLGTKPLILCDDLVY'..b'W\n+PSLLALTPKKAQQLLILAPNTELAGQIFEVCKTWSETIGLTAQLFLSGSSQKRQIERLKK\n+GPEILIGTPGRIFELIKLKKIKMMNINTIVLDEFDQLFSDSQYQFVEKIINYVPRDHQLI\n+YMSATAKFDRQKIAEDIESIDLSEQKLDNIQHCYMMVDKRERLETLRKFANIPDFRALAF\n+FNSLSDLGASEDKLLYNGVNAVSLVSDVNVKFRKVIIERFKNHELNLLLATDMVARGIDI\n+DNLECVLNFEVPFDQEAYTHRSGRTGRMGKEGLVITLVSSPSELKQLKKYASVQEVILKN\n+QELYKI\n+>Streptococcus_suis|ORF2201 length 272 aa, 819 bp, from complement(1531599..1532417) of Streptococcus_suis\n+DCSKIKIIDLAVGKLKLLSSKRKGAFMEIIRSKANHLVKQVKKLQQKKYRTSSYLIEGWH\n+LLEEAMEAGANIEHIFVVEEYFEKVAGLANVTVVSPEIMQELADSKTPQGVVAQLALPSQ\n+RLPETLDGKFLVLEDVQDPGNVGTMIRTADAAGFDGVFLSDKSADIYNMKVLRSMQGSHF\n+HLPVYRMPISSILTALKSNQIQILATTLSSQSVDYKEITPHSSFALVMGNEGQGISDLVA\n+DEADQLVHITMPGQAESLNVAIAAGILLFSFI\n+>Streptococcus_suis|ORF2301 length 102 aa, 309 bp, from 1597247..1597555 of Streptococcus_suis\n+IDVIKDASEIFHSIPIEFYIGVKMIENRLVVTRYNLTVINFGYQSCPGIFRLDGIPKVRS\n+CIFQMLYQVIPNCFSRIGILNSFSWSRTDNFIFHQETLLMLR\n+>Streptococcus_suis|ORF2401 length 141 aa, 426 bp, from 1658030..1658455 of Streptococcus_suis\n+ASITVPIARTVGSAFSSWISATKRTVSNNSSMFWLNLAEISTNSDSPPQAVEITPCSANS\n+PMTRSGFAPGLSILLIATMIGTLAAFEWLIASIVCGMTPSSAATTRMVKSVTDAPRARIE\n+VKAACPGVSKKVIFLPASSIW\n+>Streptococcus_suis|ORF2501 length 630 aa, 1893 bp, from complement(1722511..1724403) of Streptococcus_suis\n+GEMMMLQINHSLRQSEGVLEEICSSLGLDCQLELEVGGKESLRIEGSAGSYRIRAPKEHM\n+IYRGLLVLASQLAQGETDIQILERPAYRQLGFMEDCSRNAVLTVEASKILIRQLALIGYS\n+HFQLYMEDTYQLRDEPYFGYFRGAYTKEELQAIEEECHRYGMEFIPCIQTLAHLIAYLKW\n+NISSIQAIRDVDNILLIGDERTYALIDKMFEALSHLKTRTINIGMDEAHLVGLGQYLNQH\n+GYQKRSLIMCQHLERVLDIADKYGFHCSMWSDMFFQLLTASKDYTGQLEIDSEIQAYLDR\n+LKDRVTLIYWDYYQTSRESYGQKLASHQQLGDQIAFASGAWKWIGFTPDNDFSMRIAPKA\n+HAACQEYGVQEVTVTAWGDNGGECATFSILPSLHAWAELQYRGNLGCLAEHFYQLHQVSL\n+DDFLQLDLPNKTPSHPGPGHHGFNPSRYILYQDILCPLLQEHIDAEKDNAFYQELAPRLA\n+EIGSRAKGYAYLFDTQAKLCQVLATKAAISVGIRQAYQEGNRQVLAEKVDGLQQLRIDLE\n+SFYQALSYQWMVEKKVFGLDTVDIRLGGLDARIRRAIQRLQAYLNNEVPKLDELEVPILP\n+YDDFHQDKGFIATTANQWQIIATASTIYTT\n+>Streptococcus_suis|ORF2601 length 100 aa, 303 bp, from 1790150..1790452 of Streptococcus_suis\n+LKDGYQRLVVEGFADIAETFLQTETNLMTTVIFIARHDDDRPIAFPLGSLNQVNMTLVHG\n+SKGPKNNCYCLFHNLPFYCFLYFISYSFLKPKSRVFYIFL\n+>Streptococcus_suis|ORF2701 length 215 aa, 648 bp, from 1851361..1852008 of Streptococcus_suis\n+SLHPHEASHDNGKHDLHEVTHKGSKATDCLQVGFQHPGHQVERCKNRHVRDKHHQGLQDS\n+KTIADKEVEVQEFIGYFLELGILIFFLYKRLNHTNPTEVFLSNTVHLVHEGLEFPKTWSH\n+LPHDNCHNSHNQDHHENNHPPEFWHGPHCHDKGSDKQQWNPHQHCKEHHDKVLDLIDIVS\n+HPNNQVPCVKLFDISIREALDLPKGLLTNICRYPL\n+>Streptococcus_suis|ORF2801 length 1006 aa, 3021 bp, from complement(1921434..1924454) of Streptococcus_suis\n+TQTKEYEMIEFRKKAVQLASLMSVFFLCTYSFTDAMYIMAESLSTDGASTIRRTYIEDKK\n+EDKDRLNIELVESLSSPKTIGQKITIDKQSLATQNFNEKGIVVITQKGLELKKDDLEKGW\n+KLDESYNEKDLAITKSETEKRSLSNELDVLSKTVEELPVYGENYHSYRLLPTTELDYSAD\n+NVSLTLSFTKVSEVIKGELVAVVDAEHIAYFKAEPSVFKEYSQVNEKPSSTEDVNVVSPS\n+QDPPVSETKENVPDNPESQGSSTVPESEQAVDALVEQRGVICIKLTKSSSEQEEGIEDTE\n+NEAIEGATFEVRNVESENLVYTGQTDKDGLLTISNLPLGNYAVIQKSTIDGYEISATKEV\n+VELTVAQSRQTVSISNSPKNPLEGLMLNSILDSSLIPRSARVARSLLDTSLLDNPTVTGN\n+ANATTTTTVFGNKTTTITREESNIKYIFKPITISIPGVYQSYSQDGVLKKKEVVVDSNTN\n+TTKIIWEYTTTVGGVNSNITSIRNAFSTTTDSGLGEPKITSIMKDGVAITPNTTYYGNFD\n+NFKSATDNLPVGNGTYVYTIETPVVIPSDNYSLDYRSEVTVDAPKGSKLTYNGTSVTLTQ\n+KETRTLSTADTITLPAKNDGGPLGDLKVDTVNTSNTNRTIGKYRDNDDKVIEWTSSQLND\n+TSTTQSFTFDVALDSSQAAHEYKVYIYEPSNGTYTETKAEKVATPGNQITVDNVPAGAVA\n+LVKTVTNVKDEKVNHTISGAQLEALKGDIKIQKNWEADSDKVDVTFTVNGGSLTNRKETL\n+SANNTQITIANVDKFSGMRSTATKKRIYYDVTEAVPSGYILSSAQTDWENLYYVFTNKKD\n+NTTTPVFPPDTCGNYGVSSIDLVSINYVMYKSGSKIWGGFDGSMKMNLKIPAFARAGDSF\n+TLELPPELKLSHVANPNVAWSTVSANGKVIAKVYHEKDNLIRFVLTTEAYSVQEYNGWFE\n+IGVPTSNVIKINNRETTELYKTGVLPNLPEWYTTTTRNQTLIKRSR\n+>Streptococcus_suis|ORF2901 length 306 aa, 921 bp, from 1998013..1998933 of Streptococcus_suis\n+IRDNRNCFYNTNQGEYMKERIKDFISVTLGSVVMAIGFNSFFLENNIVSGGVGGLAIALN\n+ALLRWSPSDFVLYCNIPLLIICWFFLGKSVFIKTVYGAIIYPLCIKLTAGLPNLTENPLL\n+AAIFGGIILGFGLGLVFLGNSSTGGTGILIQFIHKYTPLSLGLTMAIIDGIIVGLGFVAF\n+DTDTVMYSIIALMTITYIVNRMMSGTQSSRNVMIISQKSEEIKDYITKVADRGVTELPII\n+GGFTGVDKRMLMTTISIPEMQKLETAVLEIDETAFMVVMPASQVRGRGFSLQKDHKHYDE\n+DILIPM\n'
b
diff -r 000000000000 -r 3a807e5ea6c8 tools/sample_seqs/README.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/sample_seqs/README.rst Thu Mar 27 09:40:53 2014 -0400
b
@@ -0,0 +1,106 @@
+Galaxy tool to find ORFs or simple CDSs
+=======================================
+
+This tool is copyright 2014 by Peter Cock, The James Hutton Institute
+(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
+See the licence text below (MIT licence).
+
+This tool is a short Python script (using Biopython library functions)
+to sub-sample sequence files (in a range of formats including FASTA, FASTQ,
+and SFF). This can be useful for preparing a small sample of data to test
+or time a new pipeline, or for reducing the read coverage in a de novo
+assembly.
+
+This tool is available from the Galaxy Tool Shed at:
+
+* http://toolshed.g2.bx.psu.edu/view/peterjc/sample_seqs
+
+
+Automated Installation
+======================
+
+This should be straightforward using the Galaxy Tool Shed, which should be
+able to automatically install the dependency on Biopython, and then install
+this tool and run its unit tests.
+
+
+Manual Installation
+===================
+
+There are just two files to install to use this tool from within Galaxy:
+
+* ``sample_seqs.py`` (the Python script)
+* ``sample_seqs.xml`` (the Galaxy tool definition)
+
+The suggested location is in a dedicated ``tools/sample_seqs`` folder.
+
+You will also need to modify the ``tools_conf.xml`` file to tell Galaxy to offer the
+tool. One suggested location is in the filters section. Simply add the line::
+
+    <tool file="sample_seqs/sample_seqs.xml" />
+
+You will also need to install Biopython 1.62 or later. If you want to run
+the unit tests, include this line in ``tools_conf.xml.sample`` and the sample
+FASTA files under the ``test-data`` directory. Then::
+
+    ./run_functional_tests.sh -id sample_seqs
+
+That's it.
+
+
+History
+=======
+
+======= ======================================================================
+Version Changes
+------- ----------------------------------------------------------------------
+v0.0.1  - Initial version.
+======= ======================================================================
+
+
+Developers
+==========
+
+This script and related tools are being developed on this GitHub repository:
+https://github.com/peterjc/pico_galaxy/tree/master/tools/sample_seqs
+
+For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
+the following command from the Galaxy root folder::
+
+    $ tar -czf sample_seqs.tar.gz tools/sample_seqs/README.rst tools/sample_seqs/sample_seqs.py tools/sample_seqs/sample_seqs.xml tools/sample_seqs/tool_dependencies.xml test-data/ecoli.fastq test-data/ecoli.sample_N100.fastq test-data/get_orf_input.Suis_ORF.prot.fasta test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta test-data/MID4_GLZRM4E04_rnd30_frclip.sff test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff
+
+Check this worked::
+
+    $ tar -tzf sample_seqs.tar.gz
+    tools/sample_seqs/README.rst
+    tools/sample_seqs/sample_seqs.py
+    tools/sample_seqs/sample_seqs.xml
+    tools/sample_seqs/tool_dependencies.xml
+    test-data/ecoli.fastq
+    test-data/ecoli.sample_N100.fastq
+    test-data/get_orf_input.Suis_ORF.prot.fasta
+    test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta
+    test-data/MID4_GLZRM4E04_rnd30_frclip.sff
+    test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff
+
+
+Licence (MIT)
+=============
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
b
diff -r 000000000000 -r 3a807e5ea6c8 tools/sample_seqs/sample_seqs.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/sample_seqs/sample_seqs.py Thu Mar 27 09:40:53 2014 -0400
[
@@ -0,0 +1,183 @@
+#!/usr/bin/env python
+"""Sub-sample sequence from a FASTA, FASTQ or SFF file.
+
+This tool is a short Python script which requires Biopython 1.62 or later
+for SFF file support. If you use this tool in scientific work leading to a
+publication, please cite the Biopython application note:
+
+Cock et al 2009. Biopython: freely available Python tools for computational
+molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
+http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
+
+This script is copyright 2010-2013 by Peter Cock, The James Hutton Institute
+(formerly the Scottish Crop Research Institute, SCRI), UK. All rights reserved.
+See accompanying text file for licence details (MIT license).
+
+This is version 0.1.0 of the script, use -v or --version to get the version.
+"""
+import os
+import sys
+
+def stop_err(msg, err=1):
+    sys.stderr.write(msg.rstrip() + "\n")
+    sys.exit(err)
+
+if "-v" in sys.argv or "--version" in sys.argv:
+    print("v0.1.0")
+    sys.exit(0)
+
+#Parse Command Line
+if len(sys.argv) < 5:
+    stop_err("Requires at least four arguments: seq_format, in_file, out_file, mode, ...")
+seq_format, in_file, out_file, mode = sys.argv[1:5]
+if in_file != "/dev/stdin" and not os.path.isfile(in_file):
+    stop_err("Missing input file %r" % in_file)
+
+if mode == "everyNth":
+    if len(sys.argv) != 6:
+        stop_err("If using everyNth, just need argument N (integer, at least 2)")
+    try:
+        N = int(sys.argv[5])
+    except:
+        stop_err("Bad N argument %r" % sys.argv[5])
+    if N < 2:
+        stop_err("Bad N argument %r" % sys.argv[5])
+    if (N % 10) == 1:
+        sys.stderr.write("Sampling every %ist sequence\n" % N)
+    elif (N % 10) == 2:
+        sys.stderr.write("Sampling every %ind sequence\n" % N)
+    elif (N % 10) == 3:
+        sys.stderr.write("Sampling every %ird sequence\n" % N)
+    else:
+        sys.stderr.write("Sampling every %ith sequence\n" % N)
+    def sampler(iterator):
+        global N
+        count = 0
+        for record in iterator:
+            count += 1
+            if count % N == 1:
+                yield record
+elif mode == "percentage":
+    if len(sys.argv) != 6:
+        stop_err("If using percentage, just need percentage argument (float, range 0 to 100)")
+    try:
+        percent = float(sys.argv[5]) / 100.0
+    except:
+        stop_err("Bad percent argument %r" % sys.argv[5])
+    if percent <= 0.0 or 1.0 <= percent:
+        stop_err("Bad percent argument %r" % sys.argv[5])
+    sys.stderr.write("Sampling %0.3f%% of sequences\n" % (100.0 * percent))
+    def sampler(iterator):
+        global percent
+        count = 0
+        taken = 0
+        for record in iterator:
+            count += 1
+            if percent * count > taken:
+                taken += 1
+                yield record
+else:
+    stop_err("Unsupported mode %r" % mode)
+
+def raw_fasta_iterator(handle):
+    """Yields raw FASTA records as multi-line strings."""
+    while True:
+        line = handle.readline()
+        if line == "":
+            return # Premature end of file, or just empty?
+        if line[0] == ">":
+            break
+
+    no_id_warned = False
+    while True:
+        if line[0] != ">":
+            raise ValueError(
+                "Records in Fasta files should start with '>' character")
+        try:
+            id = line[1:].split(None, 1)[0]
+        except IndexError:
+            if not no_id_warned:
+                sys.stderr.write("WARNING - Malformed FASTA entry with no identifier\n")
+        no_id_warned = True
+        id = None
+        lines = [line]
+        line = handle.readline()
+        while True:
+            if not line:
+                break
+            if line[0] == ">":
+                break
+            lines.append(line)
+            line = handle.readline()
+        yield "".join(lines)
+        if not line:
+            return # StopIteration 
+
+def fasta_filter(in_file, out_file, iterator_filter):
+    count = 0
+    #Galaxy now requires Python 2.5+ so can use with statements,
+    with open(in_file) as in_handle:
+        with open(out_file, "w") as pos_handle:
+            for record in iterator_filter(raw_fasta_iterator(in_handle)):
+                count += 1
+                pos_handle.write(record)
+    return count
+
+try:
+    from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
+    def fastq_filter(in_file, out_file, iterator_filter):
+        count = 0
+        #from galaxy_utils.sequence.fastq import fastqReader, fastqWriter
+        reader = fastqReader(open(in_file, "rU"))
+        writer = fastqWriter(open(out_file, "w"))
+        for record in iterator_filter(reader):
+            count += 1
+            writer.write(record)
+        writer.close()
+        reader.close()
+        return count
+except ImportError:
+    from Bio.SeqIO.QualityIO import FastqGeneralIterator
+    def fastq_filter(in_file, out_file, iterator_filter):
+        count = 0
+        with open(in_file) as in_handle:
+            with open(out_file, "w") as pos_handle:
+                for title, seq, qual in iterator_filter(FastqGeneralIterator(in_handle)):
+                    count += 1
+                    pos_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
+        return count
+
+def sff_filter(in_file, out_file, iterator_filter):
+    count = 0
+    try:
+        from Bio.SeqIO.SffIO import SffIterator, SffWriter
+    except ImportError:
+        stop_err("SFF filtering requires Biopython 1.54 or later")
+    try:
+        from Bio.SeqIO.SffIO import ReadRocheXmlManifest
+    except ImportError:
+        #Prior to Biopython 1.56 this was a private function
+        from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest
+    with open(in_file, "rb") as in_handle:
+        try:
+            manifest = ReadRocheXmlManifest(in_handle)
+        except ValueError:
+            manifest = None
+        in_handle.seek(0)
+        with open(out_file, "wb") as out_handle:
+            writer = SffWriter(out_handle, xml=manifest)
+            in_handle.seek(0) #start again after getting manifest
+            count = writer.write_file(iterator_filter(SffIterator(in_handle)))
+            #count = writer.write_file(SffIterator(in_handle))
+    return count
+
+if seq_format.lower()=="sff":
+    count = sff_filter(in_file, out_file, sampler)
+elif seq_format.lower()=="fasta":
+    count = fasta_filter(in_file, out_file, sampler)
+elif seq_format.lower().startswith("fastq"):
+    count = fastq_filter(in_file, out_file, sampler)
+else:
+    stop_err("Unsupported file type %r" % seq_format)
+
+sys.stderr.write("Sampled %i records\n" % count)
b
diff -r 000000000000 -r 3a807e5ea6c8 tools/sample_seqs/sample_seqs.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/sample_seqs/sample_seqs.xml Thu Mar 27 09:40:53 2014 -0400
b
@@ -0,0 +1,119 @@
+<tool id="sample_seqs" name="Sub-sample sequences files" version="0.0.1">
+    <description>e.g. to reduce coverage</description>
+    <requirements>
+        <requirement type="package" version="1.63">biopython</requirement>
+        <requirement type="python-module">Bio</requirement>
+    </requirements>
+    <version_command interpreter="python">sample_seqs.py --version</version_command>
+    <command interpreter="python">
+#if str($sampling.type) == "everyNth":
+sample_seqs.py "$input_file.ext" "$input_file" "$output_file" "${sampling.type}" "${sampling.every_n}"
+#elif str($sampling.type) == "percentage":
+sample_seqs.py "$input_file.ext" "$input_file" "$output_file" "${sampling.type}" "${sampling.percent}"
+#else:
+##Should give an error about invalid sampling type:
+sample_seqs.py "$input_file.ext" "$input_file" "$output_file" "${sampling.type}"
+#end if
+    </command>
+    <stdio>
+        <!-- Anything other than zero is an error -->
+        <exit_code range="1:" />
+        <exit_code range=":-1" />
+    </stdio>
+    <inputs>
+        <param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file" help="FASTA, FASTQ, or SFF format." />
+        <conditional name="sampling">
+            <param name="type" type="select" label="Sub-sampling approach">
+                <option value="everyNth">Take every N-th sequence (e.g. every fifth sequence)</option>
+                <option value="percentage">Take some percentage of the sequences (e.g. 20% will take every fifth sequence)</option>
+                <!-- TODO - target coverage etc -->
+            </param>
+            <when value="everyNth">
+                <param name="every_n" value="5" type="integer" min="2" label="N" help="At least 2, e.g. 5 will take every 5th sequence (taking 20% of the sequences)" />
+            </when>
+            <when value="percentage">
+                <param name="percent" value="20.0" type="float" min="0" max="100" label="Percentage" help="Between 0 and 100, e.g. 20% will take every 5th sequence" />
+            </when>
+        </conditional>
+    </inputs>
+    <outputs>
+        <data name="output_file" format="input" metadata_source="input_file" label="${input_file.name} (sub-sampled)"/>
+    </outputs>
+    <tests>
+        <test>
+            <param name="input_file" value="get_orf_input.Suis_ORF.prot.fasta" />
+            <param name="type" value="everyNth" />
+            <param name="every_n" value="100" />
+            <output name="output_file" file="get_orf_input.Suis_ORF.prot.sample_N100.fasta" />
+        </test>
+        <test>
+            <param name="input_file" value="ecoli.fastq" />
+            <param name="type" value="everyNth" />
+            <param name="every_n" value="100" />
+            <output name="output_file" file="ecoli.sample_N100.fastq" />
+        </test>
+        <test>
+            <param name="input_file" value="MID4_GLZRM4E04_rnd30_frclip.sff" ftype="sff" />
+            <param name="type" value="everyNth" />
+            <param name="every_n" value="5" />
+            <output name="output_file" file="MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff" ftype="sff"/>
+        </test>
+        <test>
+            <param name="input_file" value="get_orf_input.Suis_ORF.prot.fasta" />
+            <param name="type" value="percentage" />
+            <param name="percent" value="1.0" />
+            <output name="output_file" file="get_orf_input.Suis_ORF.prot.sample_N100.fasta" />
+        </test>
+        <test>
+            <param name="input_file" value="ecoli.fastq" />
+            <param name="type" value="percentage" />
+            <param name="percent" value="1.0" />
+            <output name="output_file" file="ecoli.sample_N100.fastq" />
+        </test>
+        <test>
+            <param name="input_file" value="MID4_GLZRM4E04_rnd30_frclip.sff" ftype="sff" />
+            <param name="type" value="percentage" />
+            <param name="percent" value="20.0" />
+            <output name="output_file" file="MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff" ftype="sff"/>
+        </test>
+    </tests>
+    <help>
+**What it does**
+
+Takes an input file of sequences (typically FASTA or FASTQ, but also
+Standard Flowgram Format (SFF) is supported), and returns a new sequence
+file sub-sampling from this (in the same format).
+
+Several sampling modes are supported, all designed to be non-random. This
+allows reproducibility, and also works on paired sequence files. Also
+note that by sampling uniformly through the file, this avoids any bias
+should reads in any part of the file are of lesser quality (e.g. one part
+of the slide).
+
+The simplest mode is to take every N-th sequence, for example taking
+every 2nd sequence would sample half the file - while taking every 5th
+sequence would take 20% of the file.
+
+
+**Example Usage**
+
+Suppose you have some Illumina paired end data as files ``R1.fastq`` and
+``R2.fastq`` which give an estimated x200 coverage, and you wish to do a
+*de novo* assembly with a tool like MIRA which recommends lower coverage.
+Taking every 3rd read would reduce the estimated coverage to about x66,
+and would preserve the pairing as well.
+
+
+**Citation**
+
+This tool uses Biopython, so if you use this Galaxy tool in work leading to a
+scientific publication please cite the following paper:
+
+Cock et al (2009). Biopython: freely available Python tools for computational
+molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
+http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
+
+This tool is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/sample_seqs
+    </help>
+</tool>
b
diff -r 000000000000 -r 3a807e5ea6c8 tools/sample_seqs/tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/sample_seqs/tool_dependencies.xml Thu Mar 27 09:40:53 2014 -0400
b
@@ -0,0 +1,6 @@
+<?xml version="1.0"?>
+<tool_dependency>
+    <package name="biopython" version="1.63">
+        <repository changeset_revision="a5c49b83e983" name="package_biopython_1_63" owner="biopython" toolshed="http://toolshed.g2.bx.psu.edu" />
+    </package>
+</tool_dependency>