Next changeset 1:16ecf25d521f (2014-03-27) |
Commit message:
Uploaded v0.0.1 |
added:
test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff test-data/MID4_GLZRM4E04_rnd30_frclip.sff test-data/ecoli.fastq test-data/ecoli.sample_N100.fastq test-data/get_orf_input.Suis_ORF.prot.fasta test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta tools/sample_seqs/README.rst tools/sample_seqs/sample_seqs.py tools/sample_seqs/sample_seqs.xml tools/sample_seqs/tool_dependencies.xml |
b |
diff -r 000000000000 -r 3a807e5ea6c8 test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff |
b |
Binary file test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff has changed |
b |
diff -r 000000000000 -r 3a807e5ea6c8 test-data/MID4_GLZRM4E04_rnd30_frclip.sff |
b |
Binary file test-data/MID4_GLZRM4E04_rnd30_frclip.sff has changed |
b |
diff -r 000000000000 -r 3a807e5ea6c8 test-data/ecoli.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/ecoli.fastq Thu Mar 27 09:40:53 2014 -0400 |
b |
b"@@ -0,0 +1,20164 @@\n+@frag_1\n+AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTC\n++\n+##%')+.024JMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_1_a\n+GAGACATATTGCCCGTTGCAGTCAGAATGAAAAGCT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMJ420.+)'%##\n+@frag_2\n+AGAGACATATTGCCCGTTGCAGTCAGAATGAAAAGC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMJ420.+)'%#\n+@frag_3\n+CTTTTCATTCTGACTGCAACGGGCAATATGTCTCTG\n++\n+%')+.024JMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_4\n+ACAGAGACATATTGCCCGTTGCAGTCAGAATGAAAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMJ420.+)'\n+@frag_5\n+TTTCATTCTGACTGCAACGGGCAATATGTCTCTGTG\n++\n+)+.024JMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_6\n+ACACAGAGACATATTGCCCGTTGCAGTCAGAATGAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMJ420.+\n+@frag_7\n+TCATTCTGACTGCAACGGGCAATATGTCTCTGTGTG\n++\n+.024JMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_8\n+CCACACAGAGACATATTGCCCGTTGCAGTCAGAATG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMJ420\n+@frag_9\n+ATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGA\n++\n+24JMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_10\n+ATCCACACAGAGACATATTGCCCGTTGCAGTCAGAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMJ4\n+@frag_11\n+TCTGACTGCAACGGGCAATATGTCTCTGTGTGGATT\n++\n+JMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_12\n+TAATCCACACAGAGACATATTGCCCGTTGCAGTCAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_13\n+TGACTGCAACGGGCAATATGTCTCTGTGTGGATTAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_14\n+TTTAATCCACACAGAGACATATTGCCCGTTGCAGTC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_15\n+ACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_16\n+TTTTTAATCCACACAGAGACATATTGCCCGTTGCAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_17\n+TGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_18\n+TTTTTTTAATCCACACAGAGACATATTGCCCGTTGC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_19\n+CAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_20\n+TCTTTTTTTAATCCACACAGAGACATATTGCCCGTT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_21\n+ACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_22\n+ACTCTTTTTTTAATCCACACAGAGACATATTGCCCG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_23\n+GGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_24\n+ACACTCTTTTTTTAATCCACACAGAGACATATTGCC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_25\n+GCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_26\n+AGACACTCTTTTTTTAATCCACACAGAGACATATTG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_27\n+AATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_28\n+TCAGACACTCTTTTTTTAATCCACACAGAGACATAT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_29\n+TATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGAT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_30\n+TATCAGACACTCTTTTTTTAATCCACACAGAGACAT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_31\n+TGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_32\n+GCTATCAGACACTCTTTTTTTAATCCACACAGAGAC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_33\n+TCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_34\n+CTGCTATCAGACACTCTTTTTTTAATCCACACAGAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_35\n+TCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_36\n+AGCTGCTATCAGACACTCTTTTTTTAATCCACACAG\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_37\n+TGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_38\n+GAAGCTGCTATCAGACACTCTTTTTTTAATCCACAC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_39\n+TGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_40\n+CAGAAGCTGCTATCAGACACTCTTTTTTTAATCCAC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_41\n+TGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_42\n+TTCAGAAGCTGCTATCAGACACTCTTTTTTTAATCC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_43\n+GATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAAC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_44\n+AGTTCAGAAGCTGCTATCAGACACTCTTTTTTTAAT\n++\n+MMMMMMMMMMMMMMMMMMM"..b"4997\n+AATTGATGATGAATCATCAGTAAAATCTATTCATTA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_4998\n+ATAATGAATAGATTTTACTGATGATTCATCATCAAT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_4999\n+TTGATGATGAATCATCAGTAAAATCTATTCATTATC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5000\n+AGATAATGAATAGATTTTACTGATGATTCATCATCA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5001\n+GATGATGAATCATCAGTAAAATCTATTCATTATCTC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMK\n+@frag_5002\n+TGAGATAATGAATAGATTTTACTGATGATTCATCAT\n++\n+KKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5003\n+TGATGAATCATCAGTAAAATCTATTCATTATCTCAA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMKKK\n+@frag_5004\n+ATTGAGATAATGAATAGATTTTACTGATGATTCATC\n++\n+KKKKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5005\n+ATGAATCATCAGTAAAATCTATTCATTATCTCAATA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMKKKK#\n+@frag_5006\n+CTATTGAGATAATGAATAGATTTTACTGATGATTCA\n++\n+##KKKKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5007\n+GAATCATCAGTAAAATCTATTCATTATCTCAATAGC\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMMMKKKK##%\n+@frag_5008\n+AGCTATTGAGATAATGAATAGATTTTACTGATGATT\n++\n+'%##KKKKMMMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5009\n+ATCATCAGTAAAATCTATTCATTATCTCAATAGCTT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMMMKKKK##%')\n+@frag_5010\n+AAAGCTATTGAGATAATGAATAGATTTTACTGATGA\n++\n++)'%##KKKKMMMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5011\n+CATCAGTAAAATCTATTCATTATCTCAATAGCTTTT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMMKKKK##%')+.\n+@frag_5012\n+GAAAAGCTATTGAGATAATGAATAGATTTTACTGAT\n++\n+0.+)'%##KKKKMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5013\n+TCAGTAAAATCTATTCATTATCTCAATAGCTTTTCA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMKKKK##%')+.02\n+@frag_5014\n+ATGAAAAGCTATTGAGATAATGAATAGATTTTACTG\n++\n+420.+)'%##KKKKMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5015\n+AGTAAAATCTATTCATTATCTCAATAGCTTTTCATT\n++\n+MMMMMMMMMMMMMMMMMMMMMKKKK##%')+.024J\n+@frag_5016\n+GAATGAAAAGCTATTGAGATAATGAATAGATTTTAC\n++\n+MJ420.+)'%##KKKKMMMMMMMMMMMMMMMMMMMM\n+@frag_5017\n+TAAAATCTATTCATTATCTCAATAGCTTTTCATTCT\n++\n+MMMMMMMMMMMMMMMMMMMKKKK##%')+.024JMM\n+@frag_5018\n+CAGAATGAAAAGCTATTGAGATAATGAATAGATTTT\n++\n+MMMJ420.+)'%##KKKKMMMMMMMMMMMMMMMMMM\n+@frag_5019\n+AAATCTATTCATTATCTCAATAGCTTTTCATTCTGA\n++\n+MMMMMMMMMMMMMMMMMKKKK##%')+.024JMMMM\n+@frag_5020\n+GTCAGAATGAAAAGCTATTGAGATAATGAATAGATT\n++\n+MMMMMJ420.+)'%##KKKKMMMMMMMMMMMMMMMM\n+@frag_5021\n+ATCTATTCATTATCTCAATAGCTTTTCATTCTGACT\n++\n+MMMMMMMMMMMMMMMKKKK##%')+.024JMMMMMM\n+@frag_5022\n+CAGTCAGAATGAAAAGCTATTGAGATAATGAATAGA\n++\n+MMMMMMMJ420.+)'%##KKKKMMMMMMMMMMMMMM\n+@frag_5023\n+CTATTCATTATCTCAATAGCTTTTCATTCTGACTGC\n++\n+MMMMMMMMMMMMMKKKK##%')+.024JMMMMMMMM\n+@frag_5024\n+TGCAGTCAGAATGAAAAGCTATTGAGATAATGAATA\n++\n+MMMMMMMMMJ420.+)'%##KKKKMMMMMMMMMMMM\n+@frag_5025\n+ATTCATTATCTCAATAGCTTTTCATTCTGACTGCAA\n++\n+MMMMMMMMMMMKKKK##%')+.024JMMMMMMMMMM\n+@frag_5026\n+GTTGCAGTCAGAATGAAAAGCTATTGAGATAATGAA\n++\n+MMMMMMMMMMMJ420.+)'%##KKKKMMMMMMMMMM\n+@frag_5027\n+TCATTATCTCAATAGCTTTTCATTCTGACTGCAACG\n++\n+MMMMMMMMMKKKK##%')+.024JMMMMMMMMMMMM\n+@frag_5028\n+CCGTTGCAGTCAGAATGAAAAGCTATTGAGATAATG\n++\n+MMMMMMMMMMMMMJ420.+)'%##KKKKMMMMMMMM\n+@frag_5029\n+ATTATCTCAATAGCTTTTCATTCTGACTGCAACGGG\n++\n+MMMMMMMKKKK##%')+.024JMMMMMMMMMMMMMM\n+@frag_5030\n+GCCCGTTGCAGTCAGAATGAAAAGCTATTGAGATAA\n++\n+MMMMMMMMMMMMMMMJ420.+)'%##KKKKMMMMMM\n+@frag_5031\n+TATCTCAATAGCTTTTCATTCTGACTGCAACGGGCA\n++\n+MMMMMKKKK##%')+.024JMMMMMMMMMMMMMMMM\n+@frag_5032\n+TTGCCCGTTGCAGTCAGAATGAAAAGCTATTGAGAT\n++\n+MMMMMMMMMMMMMMMMMJ420.+)'%##KKKKMMMM\n+@frag_5033\n+TCTCAATAGCTTTTCATTCTGACTGCAACGGGCAAT\n++\n+MMMKKKK##%')+.024JMMMMMMMMMMMMMMMMMM\n+@frag_5034\n+TATTGCCCGTTGCAGTCAGAATGAAAAGCTATTGAG\n++\n+MMMMMMMMMMMMMMMMMMMJ420.+)'%##KKKKMM\n+@frag_5035\n+TCAATAGCTTTTCATTCTGACTGCAACGGGCAATAT\n++\n+MKKKK##%')+.024JMMMMMMMMMMMMMMMMMMMM\n+@frag_5036\n+CATATTGCCCGTTGCAGTCAGAATGAAAAGCTATTG\n++\n+MMMMMMMMMMMMMMMMMMMMMJ420.+)'%##KKKK\n+@frag_5037\n+AATAGCTTTTCATTCTGACTGCAACGGGCAATATGT\n++\n+KKK##%')+.024JMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5038\n+GACATATTGCCCGTTGCAGTCAGAATGAAAAGCTAT\n++\n+MMMMMMMMMMMMMMMMMMMMMMMJ420.+)'%##KK\n+@frag_5039\n+TAGCTTTTCATTCTGACTGCAACGGGCAATATGTCT\n++\n+K##%')+.024JMMMMMMMMMMMMMMMMMMMMMMMM\n+@frag_5039_a\n+AGACATATTGCCCGTTGCAGTCAGAATGAAAAGCTA\n++\n+MMMMMMMMMMMMMMMMMMMMMMMMJ420.+)'%##K\n" |
b |
diff -r 000000000000 -r 3a807e5ea6c8 test-data/ecoli.sample_N100.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/ecoli.sample_N100.fastq Thu Mar 27 09:40:53 2014 -0400 |
b |
@@ -0,0 +1,204 @@ +@frag_1 +AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTC ++ +##%')+.024JMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_100 +TAAAGTATTTAGTGACCTAAGTCAATAAAATTTTAA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_200 +TGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_300 +CATGGTTGTTACCTCGTTACCTTTGGTCGAAAAAAA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_400 +TGGCCACCTGCCCCTGCCTGGCATTGCTTTCCAGAA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_500 +TTCGGCATCGCTGATATTGGGTAAAGCATCCTGGCC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_600 +TTGGGCAAATTCCTGATCGACGAAAGTTTTCAATTG ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_700 +TAACGGCGATCGACATTTTCTCGCCACGGCAAATCA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_800 +ATATCGACGGTAGATTCGAGGTAATGCCCCACTGCC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_900 +CACCAGTTCGCCTTTTTCATTACCGGCGGTGAAACC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_1000 +TATAGACCCCGTCAACGTCCGTCCAAATCTCGCAAC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_1100 +GGGTGAAGAACTTTAGCGCCGAAGTAGGAAAGCTCC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_1200 +ATCACGGCTGGCACCAATGAGCGTACCTGGTGCTTG ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_1300 +GCGCCGCCATGCCGACCATCCCTTTCATCCCCGGAC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_1400 +CAGTCGCTTTGTGGAACGCAGAAACTGATGCTGTAT ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_1500 +CGAGATAATGGCCAGCCGTTCCGTCACTGCCAGCGG ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_1600 +ATCCCTGAGCAATGGCGACAATGTTGATATTGGCGC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_1700 +TCGATAACCTGATCGGTATTGAACAGCATCTGATGA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_1800 +GACACGTAAGTCGATATGTTTATTCTTCAGCCAGCT ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_1900 +GATTAAACGGCTCTTTGGCTTGCGCCAGTTCTTCCT ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_2000 +AAGTCGGCATATTGATCCGCCACTGCCTGGCTGGAA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_2100 +CCGCGATTTTTCCGCCGCATAACGCAACTGATGGTA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_2200 +CGGAGAACTTCATCAATTCATCACCTGCATTGAGCA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_2300 +TCGGTATAACCCATTTCCCGCGCCAGCGTGGTCGCC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_2400 +TTCAATATCCGCCAGCTCCAGTTCACGTCCCGTTTC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_2500 +CCACGCGCGCGGCAAAGAGATCGTCGAGTTGTGACA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_2600 +GGATCATTACCATCCACTTCGGCAATCTTCACGCGG ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_2700 +GTCATTGCCCGCACCATATCCGCGCAGTACCAACGG ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_2800 +ATTGGCACTGGAAGCCGGGGCATAAACTTTAACCAT ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_2900 +GACTGAATGTCTCTGCCGCCTCAACCGTGACTACAT ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_3000 +TGCTTACCCAGTTCCTGGCAAAAACGCTCCCAGCAC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_3100 +TTCATTCATCGCCATCAGCGCCGCGACCACCGAACA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_3200 +ACGGTGCCACGTTGTCGTAATGAATGCTGCCGGAGA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_3300 +CCCGGATACGCCAGCACCCACAGCCACTCATCAAAC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_3400 +GTGAATGAAGCCTGCCAGATGTCGCCCGTGCGCAAT ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_3500 +GCGCCTGCCGGAAGCCTGGCAGTAACCGTTCACGGT ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_3600 +ACGCGCTGGGCGGTTTCCGGCTTGTCACACAGAGCG ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_3700 +CATTTAGTTTTCCAGTACTCGTGCGCCCGCCGTATC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_3800 +GGTCGTGCGGAAAAAACAGCCCCTGATTTTTGCCCA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_3900 +GGGATTTCATCACCAATAAACGCCGAGAGGATCTTC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_4000 +CCCGTGGAACAATTCCAGACAACCGACATCGCTTTC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_4100 +CGGAGGTCGCGGTCAGAATGGTCACTGGCTTATCAC ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_4200 +TTTTCTTGCAGTGGACTGATTTTGCCTCGTGGATAG ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_4300 +TTCTTCATCATCAAACGCCTGCTTCACCAGCGCCTG ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_4400 +GCGGCAGCTGCGCAACAGCTTCAAAGTAGTAGCAAA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_4500 +CGTTTCACCGGCAGACCGAGTGACTTCGCCAGCAGA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_4600 +CATCGCGTTGGATAACGTCGCCTGAGTCGCTTTGGG ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_4700 +TTTCATCATCCACGGCTGCATAACCCAGCTCTTTCA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_4800 +CCTGGATTCAACTGATCACGCAGCGCACGATAAGCT ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_4900 +TGCCAGCTCTTTTGGCAGATCCAACGTTTCACCGAG ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM +@frag_5000 +AGATAATGAATAGATTTTACTGATGATTCATCATCA ++ +MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM |
b |
diff -r 000000000000 -r 3a807e5ea6c8 test-data/get_orf_input.Suis_ORF.prot.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/get_orf_input.Suis_ORF.prot.fasta Thu Mar 27 09:40:53 2014 -0400 |
b |
b'@@ -0,0 +1,16670 @@\n+>Streptococcus_suis|ORF1 length 457 aa, 1374 bp, from 1..1374 of Streptococcus_suis\n+MNQEQLFWQRFIELAKVNFKPSIYDFYVADAKLLGINQQVANIFLNRPFKKDFWEKNFEE\n+LMIAASFESYGEPLTIQYQFTEDEQEIRNTTNTRSSIVHQVQTLEPATPQETFKPVHSDI\n+KSQYTFANFVQGDNNHWAKAAALAVSDNLGELYNPLFIFGGPGLGKTHILNAIGNKVLAD\n+NPQARIKYVSSETFINEFLEHLRLNDMESFKKTYRNLDLLLIDDIQSLRNKATTQEEFFH\n+TFNALHEKNKQIVLTSDRNPDHLDNLEERLVTRFKWGLTSEITPPDFETRIAILRNKCEN\n+LPYNFTNETLSYLAGQFDSNVRDLEGALKDIHLIATMRQLSEISVEVAAEAIRSRKQTNP\n+QNMVIPIEKIQTEVGNFYGVSLKELKGSKRVQHIVHARQVAMFLAREMTDNSLPKIGKEF\n+GNRDHTTVMHAYNKIKTLLLDDENLEIEITSIKNKLR\n+>Streptococcus_suis|ORF2 length 385 aa, 1158 bp, from 1507..2664 of Streptococcus_suis\n+IINKGESMIQFSINKNIFLQALSITKRAISTKNAIPILSTVKITVTSEGITLTGSNGQIS\n+IEHFISIQDENAGLLISSPGSILLEAGFFINVVSSMPDLVLDFNEIEQKQIVLTSGKSEI\n+TLKGKEAEQYPRLQEVPTSKPLVLETKVLKQTINETAFAASTQESRPILTGVHFVLTENK\n+NLKTVATDSHRMSQRKLVLDTSGDDFNVVIPSRSLREFTAVFTDDIETVEVFFSNNQILF\n+RSEHISFYTRLLEGTYPDTDRLIPTEFKTTAIFDTANLRHSMERARLLSNATQNGTVKLE\n+IANNVVSAHVNSPEVGRVNEELDTVEVSGEDLVISFNPTYLIEALKATTSEQVKISFISS\n+VRPFTLIPNNEGEDFIQLVTPVRTN\n+>Streptococcus_suis|ORF3 length 104 aa, 315 bp, from complement(1707..2021) of Streptococcus_suis\n+TPVRIGRLSCVEAANAVSLIVCFNTLVSNTNGFEVGTSCKRGYCSASFPFNVISDLPLVK\n+TICFCSISLKSRTKSGILDTTLIKKPASKRMEPGELIKSPAFSS\n+>Streptococcus_suis|ORF4 length 293 aa, 882 bp, from 2756..3637 of Streptococcus_suis\n+MTLYILANPNAGSHTAEHIIFKIKESYPQLAVNIFMTVGPEDEKSQIEAILKEFVSSEDQ\n+LMILGGDGTLSKALRFWPASLPFAYYPTGSGNDFAKAMNITSLYRSVDAILERKTSRIYV\n+LNSSYGTVVNSMDFGFAAQVINGSTNSILKKILNKVKLGKLTYLFFGIKTLFSKQAINLE\n+LTLDEKSYQLDNLFFISVANSLYFGGGIMIWPTASAKKKEVDIVYFKNGNFYQRLQSLLA\n+LLTKRHESSHTIQHLTGVDVVLKSKEKLLLQIDGETCTANEVTLTYQERSMYL\n+>Streptococcus_suis|ORF5 length 126 aa, 381 bp, from 3933..4313 of Streptococcus_suis\n+KKEEEMIMKQLAQQIRVLRTAKNLSQDELAEKLYISRQAVSKWENGEATPDIDKLVQLAE\n+IFGVSLDYLVLGKEPEKEIVVEQRGKMNGWEFLNEESKRPLTRGDVVLLIFLAVMLLGGL\n+FIKHYF\n+>Streptococcus_suis|ORF6 length 377 aa, 1134 bp, from 4381..5514 of Streptococcus_suis\n+LESKKNMSLTAGIVGLPNVGKSTLFNAITKAGAEAANYPFATIDPNVGMVEVPDERLQKL\n+TELIIPKKTVPTTFEFTDIAGIVKGASKGEGLGNKFLANIREVDAIVHVVRAFDDENVMR\n+EQGREDAFVDPIADIDTINLELILADLESINKRYARVEKMARTQKDKDSVAEFAVLEKIK\n+PVLEDGKSARTVEFTDEEQKIVKQLFLLTTKPVLYVANVDEDKVADPEAISYVQQIRDFA\n+ATENAEVVVISARAEEEISELDDEDKGEFLEALGLTESGVDKLTRAAYHLLGLGTYFTAG\n+EKEVRAWTFKRGMKAPQCAGIIHSDFEKGFIRAVTMSYDDLMTYGSEKAVKEAGRLREEG\n+KEYVVQDGDIMEFRFNV\n+>Streptococcus_suis|ORF7 length 115 aa, 348 bp, from complement(4450..4797) of Streptococcus_suis\n+VNGINISDWIHKGIFTALFTHDIFIVKGTHNVDNRINFADIGQEFISKSFTFRSTFYDTS\n+NISKFKSRWHCLFRDDEFGQLLQTLIGHFYHADVWINSCERVVCSFCSCLGNCVK\n+>Streptococcus_suis|ORF8 length 115 aa, 348 bp, from complement(4491..4838) of Streptococcus_suis\n+RLLMLSRSAKINSRLMVSISAIGSTKASSRPCSRMTFSSSKARTTWTIASTSRILAKNLF\n+PSPSPLEAPFTIPAISVNSKVVGTVFLGMMSSVNFCRRSSGTSTMPTFGSIVAKG\n+>Streptococcus_suis|ORF9 length 192 aa, 579 bp, from 5663..6241 of Streptococcus_suis\n+GEKMTRLIIGLGNPGDRYFETKHNVGFMLLDKIAKRENVTFNHDKIFQADIATTFIDGEK\n+IYLVKPTTFMNESGKAVHALMTYYGLDATDILVAYDDLDMAVGKIRFRQKGSAGGHNGIK\n+SIVKHIGTQEFDRIKIGIGRPKGKMSVVNHVLSGFDIEDRIEIDLALDKLDKAVNVYLEE\n+DDFDTVMRKFNG\n+>Streptococcus_suis|ORF10 length 1166 aa, 3501 bp, from 6235..9735 of Streptococcus_suis\n+RIMNILDLLHKNKQINQWQSGLNQSTRQLLLGLSGTSKSLIMATAYDCLAEKIMIVTATQ\n+NDAEKLVADLTAIIGSENVYNFFTDDSPIAEFVFASKERTQSRIDSLNFLTDSTSSGILV\n+ASIVACRVLLPSPETYKGSKIQLEVGQEIEVDKLVKNLVNIGYKKVSRVLTQGEFSQRGD\n+ILDIFDMQSETPYRIEFFGDEIDGIRIFDVDSQKSLENLDEISISPASDIILSSEDYSRA\n+SQYIQTAIEQSTLEEQQSYLREVLADMQTEYRHPDLRKFLSCIYEQSWTLLDYLPKSSPL\n+FLDDFHKIADKQAQFEKEIADLLTDDLQKGKTVSSLKYFASTYAELRKYKPATFFSSFQK\n+GLGNVKFDALYQFTQHPMQEFFHQIPLLKDELTRYAKSNNTVVIQASSDVSLQTLQKNLQ\n+EYDIHLPVHAADKLVEGQQQVTIGQLASGFHLMDEKLVFITEKEIFNKKMKRKTRRTNIS\n+NAERIKDYSELAVGDYVVHHVHGIGQYLGIETIEISGIHRDYLTVQYQNSDRISIPVEQI\n+DLLSKYLASDGKAPKVNKLNDGRFQRTKQKVQKQVEDIADDLIKLYAERSQLKGFAFSPD\n+DENQVEFDNYFTHVETDDQLRSIDEIKKDMEKDSPMDRLLVGDVGFGKTEVAMRAAFKAV\n+NDGKQVAILVPTTVLAQQHYANFQERFAEFPVNVDVMSRFKTKAEQEKTLEKLKKGQVDI\n+LIGTHRLLSKDVVFADLGLLVIDEEQRFGVKHKERLKELKKKIDVLTLTATPIPRTLQMS\n+MLGIRDLSVIETPPTNRYP'..b'\n+DTDTVMYSIIALMTITYIVNRMMSGTQSSRNVMIISQKSEEIKDYITKVADRGVTELPII\n+GGFTGVDKRMLMTTISIPEMQKLETAVLEIDETAFMVVMPASQVRGRGFSLQKDHKHYDE\n+DILIPM\n+>Streptococcus_suis|ORF2902 length 565 aa, 1698 bp, from 1998923..2000620 of Streptococcus_suis\n+FQCNSLKIQVLSSTIKLIDRNRGETMLTVSDVSLRFSDRKLFDDVNIKFTAGNTYGLIGA\n+NGAGKSTFLKILAGDIEPSTGHISLGPDERLSVLRQNHFDYEDERVIDVVIMGNEQLYSI\n+MKEKDAIYMKEDFSDEDGVRAAELEGEFAELGGWEAESEASQLLQNLNISEDLHYQNMSE\n+LTNGEKVKVLLAKALFGKPDVLLLDEPTNGLDIQSINWLEDFLIDFENTVIVVSHDRHFL\n+NKVCTHMADLDFGKIKIFVGNYDFWKQSSELAAKLQADRNAKAEEKIKELQEFVARFSAN\n+ASKSKQATSRKKMLDKIELEEIIPSSRKYPFINFKSEREIGNDLLTVENLKVVIDGETIL\n+DNISFILRPGDKTALIGQNDIQTTALIRALMGDIEYEGTVKWGVTTSQSYLPKDNTRDFD\n+TNESILDWLRQFASKEEDDNTFLRGFLGRMLFSGDEVNKPVNVLSGGEKVRVMLSKLMLL\n+KSNVLVLDDPTNHLDLESISSLNDGLKAFKESIIFASHDHEFIQTLANHIIVISKNGVID\n+RIDETYDEFLENAEVQAKVQELWKA\n+>Streptococcus_suis|ORF2903 length 115 aa, 348 bp, from complement(1999705..2000052) of Streptococcus_suis\n+PIRAVLSPGRRIKLILSRIVSPSITTFKFSTVKRSLPISRSDLKLINGYLRLEGMISSNS\n+ILSNIFLREVACLDLEALAEKRATNSCSSLIFSSAFALRSACSLAASSLDCFQKS\n+>Streptococcus_suis|ORF2904 length 110 aa, 333 bp, from 1999974..2000306 of Streptococcus_suis\n+KLLLMVKRFLTISALSCAQVTRLLLLVKTTSKQLLSFVLLWAILNMKVLSSGVSLLVNPT\n+YQKTILVTLIQTNLSLIGSVNLPARKKMTIPSCAVSWDVCSSRVMRLTNL\n+>Streptococcus_suis|ORF2905 length 117 aa, 354 bp, from 2000502..2000855 of Streptococcus_suis\n+QTISSSFLKTVLSTESTKLMMNSWKMLKYKQKYKNFGKHNKKRLGLLPSLSSQSSCQHLS\n+AVVDCQICSCFTLQIWPLRLLRTKFALSPTSNCLPDSLSCAGVGVKQSGNRLFQLNN\n+>Streptococcus_suis|ORF2906 length 872 aa, 2619 bp, from 2000888..2003506 of Streptococcus_suis\n+PVKFFPTSFSFKSMKKIFTKTSIYYLLSFLIPLTIISIVLAFQGIWWGSDTTILASDGFH\n+QYVIFNQTLRNTLHGDGSLFYTFSSGLGLNFYALSSYYLGSFLSPIVFFFDLQSMPDAIY\n+LVTIVKFGLTGLSTYFSLKGIHKNLKEEWALLLATSFSLMSFSTSQLEINNWLDVFILLP\n+LVLLGLHRLLKKQGPILYYITLTCLFIQNYYFGYMVAIFLTLWTLVQLSWIDSQRIKRFI\n+NFTIVSILSALSSMFMLLPTYLDLKTHGETFTKIVNLKTEDSWYLDFFAKNLVGSFDTTK\n+FGSIPMISVGLVPLILALLFFTLKEIKPTVKLSYALFFTFIISSFYLQPLNLFWQGMHAP\n+NMFLYRYAWALSITVIYLAAETLVRLRQVSIKNFTLIVSFLLICFTSTFIFRDHYEFLTD\n+VNFLLTLEFLIAYFILFVAMIRYKSSLKWINIVLLFFTFLELGLHSHYQVQGISDEWHFP\n+SRSNYEEKLTDIDSIVKSTKTTTDSFYRIERLLPQTGNDSMKFNYNGISQFSSIRNRASS\n+SVLDKLGFRSDGTNLNLRYQNNTIIADSLFGVKYNLATTDPNKFGFTLNQSQSTINLYEN\n+SFNLGLALLTEGIYKDVNFTNLTLDNQTNFLNQLTGLSQKYYHTLSDVVSQNTVELSNRM\n+TVNKVDNEDAAKATFLVNIPANSQVYLNLPNLTFSNENQKKVVITVNNQSSEFTLDNAFS\n+FFNVGSFTTDVQVQVNVYFPENNQVSFDKPQFYRLDLLAFQQAISILQEKQVVTKTDGNK\n+VTVDFVTDKESSLLLTLPYDKGWNATIDGKPIKIQKAQDGFMKVDVSPGQTKLVLTFVPN\n+GFYLGLLISFGAVFVFFSYQFIGYYYSKNREY\n+>Streptococcus_suis|ORF2907 length 235 aa, 708 bp, from complement(2003907..2004614) of Streptococcus_suis\n+FHVKQGVKMNQKEYRVFEGLRIACSLTFISGYLNAFTFVTQGGRFAGVQSGNVISLAYFL\n+AKGDFAQVVNFSIPILFFVFGQFFTYLARRYFEKQTWSWHFGSSVMMLVLILLTIILSPI\n+MPASFTIASLAFVASIQVETFRRLRGAPYANVMMTGNVKNAAYLWFKGVIEKDSELRKTG\n+RNILLTIIGFMLGVIISTHLSFQFEEYALIGLILPVLYINYELWQEKRPTRGRSK\n+>Streptococcus_suis|ORF2908 length 180 aa, 543 bp, from complement(2004615..2005157) of Streptococcus_suis\n+PYPDFLKIFSVVCLWICYNYFMKIKLITVGKLKEKYLKEGIAEYSKRLGRFTKLDMIELP\n+DEKTPDKASQAENEQILKKEADRIMSKIGERDFVIALAIEGKQFPSEEFSQRISDIAVNG\n+YSDITFIIGGSLGLDSCIKKRANLLMSFGQLTLPHQLMKLVLIEQIYRAFMIQQGSPYHK\n+>Streptococcus_suis|ORF2909 length 413 aa, 1242 bp, from 2005223..2006464 of Streptococcus_suis\n+VIIKKEIVLLRKIKEMERIPYMKKYLKFAILFVIGFFGGLIGALSASFFQPQVQQANSAI\n+TSVSNVQYNNETSTTKAVEKVQNAVVSVINYQKSANNSLGVIFGNIESSDELAVAGEGSG\n+VIYKKYGQYAYIVTNTHVINNAEKIDILLASGEKISGELVGSDTYSDIAVIKISADKVTA\n+VAEFADSDTIKVGETAIAIGSPLGSVYANTVTQGIISSLSRTVTSQSKDGQTISTNAIQT\n+DTAINPGNSGGPLINTQGQVIGITSSKITSSSANSSGVAVEGLGFAIPANDAVAIINQLE\n+KTGQVSRPALGVHMVNLTTLSTSQLEKAGLSNTELTSGVVIVSTQSGLPADGKLETFDVI\n+TEIDGEAIQNKSDLQSALYKHQIGDTITVTYYRNNQKQTVDIKLTHSTEELSE\n+>Streptococcus_suis|ORF2910 length 256 aa, 771 bp, from 2006519..2007289 of Streptococcus_suis\n+GYMEELRTLNISEIHPNPYQPRIHFDEKELLELAQSIKENGLIQPIIVRKSSIIGYELLA\n+GERRLRASQLAGLTTIPAVVKELTDDDLLYQAIIENLQRSNLNPIEEAASYQKLISRGLT\n+HDEVAQIMGKSRPYISNLLRLLNLSSQTKQAVEEGKISQGHARQLVSFSEEKQAEWVQLI\n+LSKDLSVRTLEKLIAANKKKHTKLKQRDQFLKEQEDSLSKTLGTATKIIKKKNGSGEIRI\n+SFNDLDEFERIINNFK\n' |
b |
diff -r 000000000000 -r 3a807e5ea6c8 test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta Thu Mar 27 09:40:53 2014 -0400 |
b |
b'@@ -0,0 +1,194 @@\n+>Streptococcus_suis|ORF1 length 457 aa, 1374 bp, from 1..1374 of Streptococcus_suis\n+MNQEQLFWQRFIELAKVNFKPSIYDFYVADAKLLGINQQVANIFLNRPFKKDFWEKNFEE\n+LMIAASFESYGEPLTIQYQFTEDEQEIRNTTNTRSSIVHQVQTLEPATPQETFKPVHSDI\n+KSQYTFANFVQGDNNHWAKAAALAVSDNLGELYNPLFIFGGPGLGKTHILNAIGNKVLAD\n+NPQARIKYVSSETFINEFLEHLRLNDMESFKKTYRNLDLLLIDDIQSLRNKATTQEEFFH\n+TFNALHEKNKQIVLTSDRNPDHLDNLEERLVTRFKWGLTSEITPPDFETRIAILRNKCEN\n+LPYNFTNETLSYLAGQFDSNVRDLEGALKDIHLIATMRQLSEISVEVAAEAIRSRKQTNP\n+QNMVIPIEKIQTEVGNFYGVSLKELKGSKRVQHIVHARQVAMFLAREMTDNSLPKIGKEF\n+GNRDHTTVMHAYNKIKTLLLDDENLEIEITSIKNKLR\n+>Streptococcus_suis|ORF101 length 112 aa, 339 bp, from complement(72006..72344) of Streptococcus_suis\n+LQFNFYVYTTWKIQFHQSVNCFLSWVDDVDQTFVCTHFELLTRIFVLVSRTDDCVEATFC\n+WKWNWTCYLSTCTCCSFNDFCSCCIKCTVFVRFQADANFFVCHLFLLFVYLT\n+>Streptococcus_suis|ORF201 length 360 aa, 1083 bp, from complement(128035..129117) of Streptococcus_suis\n+SCHGGRRMTLFGKIKEVTELQSLPGFEGQVRNHIRQKITPHVDRIETDGLGGIFGIKDTA\n+VENAPRILVVAHMDEVGFMISQIKPDGTFRVVELGGWNPLVVSSQAFTLQLQDGRTIPAI\n+SGSVPPHLSRGANAPGMPAIADIIFDAGFANYDEAWAFGVRPGDVLVPKNETILTANGKN\n+VISKAWDNRFGVLMVTELLESLSGHALPNQLIAGANVQEEVGLRGAHASTTKFNPDIFLA\n+VDCSPAGDIYGDQGKIGDGTLLRFYDPGHIMLKNMKDFLLTTAEEAGVKFQYYCGKGGTD\n+AGAAHLKNHGVPSTTIGVCARYIHSHQTLYSMDDFLEAQAFLQTIVKKLDRSTVDLIKNY\n+>Streptococcus_suis|ORF301 length 105 aa, 318 bp, from complement(191714..192031) of Streptococcus_suis\n+NQGQTVRFFHIRGNFCQKFIIGNTCGSCQMQFIPNIVLDKLGNLNRRADAQLIFCYIQVS\n+LIDRHGLHQISIAMENFSNLSPNFLIFFIIARNENRLRTTLISLF\n+>Streptococcus_suis|ORF401 length 120 aa, 363 bp, from 265643..266005 of Streptococcus_suis\n+TTGTTSPIAPKWKASSKSLRVPTSEPTTLIPSSTVFTILRSMYSDGSPTATTYPPARTLS\n+IAWLKATLETAVTTVECTPPPVISLIYPGTSSTSSPLIVTSAPTSLASSNLSLLMSTAIT\n+>Streptococcus_suis|ORF501 length 104 aa, 315 bp, from 336857..337171 of Streptococcus_suis\n+KDLIRKSSLLVYKLFSPSRTMSKSIVAWRMIYSDEDPIRRRSKAFRPVAPSNIRSAPVSF\n+GISIASRVLGLPNSMRICTSVRPALRASCLYSSRRSSASLSKDS\n+>Streptococcus_suis|ORF601 length 665 aa, 1998 bp, from 409896..411893 of Streptococcus_suis\n+VMIQIGKIFAGRYRIVRQIGRGGMADVYLARDLILDGEEVAVKVLRTNYQTDQIAIQRFQ\n+REARAMAELDHPNIVRISDIGEEDGQQYLAMEYVNGLDLKRYIKENAPLSNDVAVRIMGQ\n+ILLAMRMAHTRGIVHRDLKPQNVLLTSNGVAKVTDFGIAVAFAETSLTQTNSMLGSVHYL\n+SPEQARGSKATIQSDIYAMGIILFEMLTGRIPYDGDSAVTIALQHFQKPLPSVREENANV\n+PQALENVVLKATAKKLNERYKSVAEMYADLASALSMDRQNEPRVELEGNKVDTKTLPKLS\n+QANVETKVPHTNSSAQVSATDKGSGKKEVAKSGNKPVSKPRPGIRTRYKVLIGAILLTVI\n+AAGLMFFNTPRTVTVPDVSGQTVEKATEMIEVAGLEVGNITEEATATVDEGLVIRTSPAA\n+KTTRRQGSKIDIVVATAALASIPDVVDKESDTARQELEALGFQVTIKEEYSEKVAQGLVI\n+KTDPGANSSAEKGAKITLYVSKGVAPQVVPNVVGKSQENATQILQTAGFSIGTITQEYSS\n+SVTAGQVISTDPVANTELAKGSIINLVISKGKELIMPDLTSGNYTYSQARSQLQALGVNA\n+ESIEKQEDRSYYSTTSDIVIGQYPAAGATIDGTVTLYVSVASTRTSSDSSAGSSTSTSTS\n+TGSGQ\n+>Streptococcus_suis|ORF701 length 182 aa, 549 bp, from 489284..489832 of Streptococcus_suis\n+LGKSVALSSLSRVTIYRGENLGKLRFFVAYLTSRYIIDIQNEGRMNMNIKEVSDVTGLSA\n+DTIRYYERVGLIPKIARKSSGVRDFVENDVAVLEFVRCFRSAGMSIERLIEYMGLVQAGD\n+STVEARIDLLKEEREVLQSRLSKIQQALERLDYKIENYQTILRGAENQLFTDGSGSCKKD\n+RE\n+>Streptococcus_suis|ORF801 length 428 aa, 1287 bp, from 561960..563246 of Streptococcus_suis\n+KSSRDCESCLLLFVILKVMQADRRKTFGKMRIRINNLFFVAIAFMGIIISNSQVVLAIGK\n+ASVIQYLSYLVLILCIVNDLLKNNKHIVVYKLGYLFLIIFLFTIGICQQILPITTKIYLS\n+ISMMIISVLATLPISLIKDIDDFRRISNHLLFALFITSILGIMMGATMFTGAVEGIGFSQ\n+GFNGGLTHKNFFGITILMGFVLTYLAYKYGSYKRTDRFILGLELFLILISNTRSVYLILL\n+LFLFLVNLDKIKIEQRQWSTLKYISMLFCAIFLYYFFGFLITHSDSYAHRVNGLINFFEY\n+YRNDWFHLMFGAADLAYGDLTLDYAIRVRRVLGWNGTLEMPLLSIMLKNGFIGLVGYGIV\n+LYKLYRNVRILKTDNIKTIGKSVFIIVVLSATVENYIVNLSFVFMPICFCLLNSISTMES\n+TINKQLQT\n+>Streptococcus_suis|ORF901 length 241 aa, 726 bp, from 628396..629121 of Streptococcus_suis\n+NSPMRLDKCLEKAKVGSRKQVKKLFKAQQIKINGQAAQSLSQIVDPELQTIQVSGKKVAL\n+EGSAYYLLHKPAGVVSAVTDQEHQTVIDLISPQDSREGLYPVGRLDRDTEGLVLITNNGP\n+LGYRMLHPSHHVDKVYYVEVNGCLAEDASKFFASGVTFLDGTRCQPADLTVLEASLDHSR\n+ATIKLAEGKFHQVKKMFLAYGVKVTYLKRISFGGFELGDLERGTYRQLTPNEMEHLFTYF\n+D\n+>Streptococcus_suis|ORF1001 length 374 aa, 1125 bp, from 694014..695138 of Streptococcus_suis\n+HYLLFQGGILMKVFASPSRYIQGKHVLFQGAEAIGKLGTKPLILCDDLVY'..b'W\n+PSLLALTPKKAQQLLILAPNTELAGQIFEVCKTWSETIGLTAQLFLSGSSQKRQIERLKK\n+GPEILIGTPGRIFELIKLKKIKMMNINTIVLDEFDQLFSDSQYQFVEKIINYVPRDHQLI\n+YMSATAKFDRQKIAEDIESIDLSEQKLDNIQHCYMMVDKRERLETLRKFANIPDFRALAF\n+FNSLSDLGASEDKLLYNGVNAVSLVSDVNVKFRKVIIERFKNHELNLLLATDMVARGIDI\n+DNLECVLNFEVPFDQEAYTHRSGRTGRMGKEGLVITLVSSPSELKQLKKYASVQEVILKN\n+QELYKI\n+>Streptococcus_suis|ORF2201 length 272 aa, 819 bp, from complement(1531599..1532417) of Streptococcus_suis\n+DCSKIKIIDLAVGKLKLLSSKRKGAFMEIIRSKANHLVKQVKKLQQKKYRTSSYLIEGWH\n+LLEEAMEAGANIEHIFVVEEYFEKVAGLANVTVVSPEIMQELADSKTPQGVVAQLALPSQ\n+RLPETLDGKFLVLEDVQDPGNVGTMIRTADAAGFDGVFLSDKSADIYNMKVLRSMQGSHF\n+HLPVYRMPISSILTALKSNQIQILATTLSSQSVDYKEITPHSSFALVMGNEGQGISDLVA\n+DEADQLVHITMPGQAESLNVAIAAGILLFSFI\n+>Streptococcus_suis|ORF2301 length 102 aa, 309 bp, from 1597247..1597555 of Streptococcus_suis\n+IDVIKDASEIFHSIPIEFYIGVKMIENRLVVTRYNLTVINFGYQSCPGIFRLDGIPKVRS\n+CIFQMLYQVIPNCFSRIGILNSFSWSRTDNFIFHQETLLMLR\n+>Streptococcus_suis|ORF2401 length 141 aa, 426 bp, from 1658030..1658455 of Streptococcus_suis\n+ASITVPIARTVGSAFSSWISATKRTVSNNSSMFWLNLAEISTNSDSPPQAVEITPCSANS\n+PMTRSGFAPGLSILLIATMIGTLAAFEWLIASIVCGMTPSSAATTRMVKSVTDAPRARIE\n+VKAACPGVSKKVIFLPASSIW\n+>Streptococcus_suis|ORF2501 length 630 aa, 1893 bp, from complement(1722511..1724403) of Streptococcus_suis\n+GEMMMLQINHSLRQSEGVLEEICSSLGLDCQLELEVGGKESLRIEGSAGSYRIRAPKEHM\n+IYRGLLVLASQLAQGETDIQILERPAYRQLGFMEDCSRNAVLTVEASKILIRQLALIGYS\n+HFQLYMEDTYQLRDEPYFGYFRGAYTKEELQAIEEECHRYGMEFIPCIQTLAHLIAYLKW\n+NISSIQAIRDVDNILLIGDERTYALIDKMFEALSHLKTRTINIGMDEAHLVGLGQYLNQH\n+GYQKRSLIMCQHLERVLDIADKYGFHCSMWSDMFFQLLTASKDYTGQLEIDSEIQAYLDR\n+LKDRVTLIYWDYYQTSRESYGQKLASHQQLGDQIAFASGAWKWIGFTPDNDFSMRIAPKA\n+HAACQEYGVQEVTVTAWGDNGGECATFSILPSLHAWAELQYRGNLGCLAEHFYQLHQVSL\n+DDFLQLDLPNKTPSHPGPGHHGFNPSRYILYQDILCPLLQEHIDAEKDNAFYQELAPRLA\n+EIGSRAKGYAYLFDTQAKLCQVLATKAAISVGIRQAYQEGNRQVLAEKVDGLQQLRIDLE\n+SFYQALSYQWMVEKKVFGLDTVDIRLGGLDARIRRAIQRLQAYLNNEVPKLDELEVPILP\n+YDDFHQDKGFIATTANQWQIIATASTIYTT\n+>Streptococcus_suis|ORF2601 length 100 aa, 303 bp, from 1790150..1790452 of Streptococcus_suis\n+LKDGYQRLVVEGFADIAETFLQTETNLMTTVIFIARHDDDRPIAFPLGSLNQVNMTLVHG\n+SKGPKNNCYCLFHNLPFYCFLYFISYSFLKPKSRVFYIFL\n+>Streptococcus_suis|ORF2701 length 215 aa, 648 bp, from 1851361..1852008 of Streptococcus_suis\n+SLHPHEASHDNGKHDLHEVTHKGSKATDCLQVGFQHPGHQVERCKNRHVRDKHHQGLQDS\n+KTIADKEVEVQEFIGYFLELGILIFFLYKRLNHTNPTEVFLSNTVHLVHEGLEFPKTWSH\n+LPHDNCHNSHNQDHHENNHPPEFWHGPHCHDKGSDKQQWNPHQHCKEHHDKVLDLIDIVS\n+HPNNQVPCVKLFDISIREALDLPKGLLTNICRYPL\n+>Streptococcus_suis|ORF2801 length 1006 aa, 3021 bp, from complement(1921434..1924454) of Streptococcus_suis\n+TQTKEYEMIEFRKKAVQLASLMSVFFLCTYSFTDAMYIMAESLSTDGASTIRRTYIEDKK\n+EDKDRLNIELVESLSSPKTIGQKITIDKQSLATQNFNEKGIVVITQKGLELKKDDLEKGW\n+KLDESYNEKDLAITKSETEKRSLSNELDVLSKTVEELPVYGENYHSYRLLPTTELDYSAD\n+NVSLTLSFTKVSEVIKGELVAVVDAEHIAYFKAEPSVFKEYSQVNEKPSSTEDVNVVSPS\n+QDPPVSETKENVPDNPESQGSSTVPESEQAVDALVEQRGVICIKLTKSSSEQEEGIEDTE\n+NEAIEGATFEVRNVESENLVYTGQTDKDGLLTISNLPLGNYAVIQKSTIDGYEISATKEV\n+VELTVAQSRQTVSISNSPKNPLEGLMLNSILDSSLIPRSARVARSLLDTSLLDNPTVTGN\n+ANATTTTTVFGNKTTTITREESNIKYIFKPITISIPGVYQSYSQDGVLKKKEVVVDSNTN\n+TTKIIWEYTTTVGGVNSNITSIRNAFSTTTDSGLGEPKITSIMKDGVAITPNTTYYGNFD\n+NFKSATDNLPVGNGTYVYTIETPVVIPSDNYSLDYRSEVTVDAPKGSKLTYNGTSVTLTQ\n+KETRTLSTADTITLPAKNDGGPLGDLKVDTVNTSNTNRTIGKYRDNDDKVIEWTSSQLND\n+TSTTQSFTFDVALDSSQAAHEYKVYIYEPSNGTYTETKAEKVATPGNQITVDNVPAGAVA\n+LVKTVTNVKDEKVNHTISGAQLEALKGDIKIQKNWEADSDKVDVTFTVNGGSLTNRKETL\n+SANNTQITIANVDKFSGMRSTATKKRIYYDVTEAVPSGYILSSAQTDWENLYYVFTNKKD\n+NTTTPVFPPDTCGNYGVSSIDLVSINYVMYKSGSKIWGGFDGSMKMNLKIPAFARAGDSF\n+TLELPPELKLSHVANPNVAWSTVSANGKVIAKVYHEKDNLIRFVLTTEAYSVQEYNGWFE\n+IGVPTSNVIKINNRETTELYKTGVLPNLPEWYTTTTRNQTLIKRSR\n+>Streptococcus_suis|ORF2901 length 306 aa, 921 bp, from 1998013..1998933 of Streptococcus_suis\n+IRDNRNCFYNTNQGEYMKERIKDFISVTLGSVVMAIGFNSFFLENNIVSGGVGGLAIALN\n+ALLRWSPSDFVLYCNIPLLIICWFFLGKSVFIKTVYGAIIYPLCIKLTAGLPNLTENPLL\n+AAIFGGIILGFGLGLVFLGNSSTGGTGILIQFIHKYTPLSLGLTMAIIDGIIVGLGFVAF\n+DTDTVMYSIIALMTITYIVNRMMSGTQSSRNVMIISQKSEEIKDYITKVADRGVTELPII\n+GGFTGVDKRMLMTTISIPEMQKLETAVLEIDETAFMVVMPASQVRGRGFSLQKDHKHYDE\n+DILIPM\n' |
b |
diff -r 000000000000 -r 3a807e5ea6c8 tools/sample_seqs/README.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/sample_seqs/README.rst Thu Mar 27 09:40:53 2014 -0400 |
b |
@@ -0,0 +1,106 @@ +Galaxy tool to find ORFs or simple CDSs +======================================= + +This tool is copyright 2014 by Peter Cock, The James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +See the licence text below (MIT licence). + +This tool is a short Python script (using Biopython library functions) +to sub-sample sequence files (in a range of formats including FASTA, FASTQ, +and SFF). This can be useful for preparing a small sample of data to test +or time a new pipeline, or for reducing the read coverage in a de novo +assembly. + +This tool is available from the Galaxy Tool Shed at: + +* http://toolshed.g2.bx.psu.edu/view/peterjc/sample_seqs + + +Automated Installation +====================== + +This should be straightforward using the Galaxy Tool Shed, which should be +able to automatically install the dependency on Biopython, and then install +this tool and run its unit tests. + + +Manual Installation +=================== + +There are just two files to install to use this tool from within Galaxy: + +* ``sample_seqs.py`` (the Python script) +* ``sample_seqs.xml`` (the Galaxy tool definition) + +The suggested location is in a dedicated ``tools/sample_seqs`` folder. + +You will also need to modify the ``tools_conf.xml`` file to tell Galaxy to offer the +tool. One suggested location is in the filters section. Simply add the line:: + + <tool file="sample_seqs/sample_seqs.xml" /> + +You will also need to install Biopython 1.62 or later. If you want to run +the unit tests, include this line in ``tools_conf.xml.sample`` and the sample +FASTA files under the ``test-data`` directory. Then:: + + ./run_functional_tests.sh -id sample_seqs + +That's it. + + +History +======= + +======= ====================================================================== +Version Changes +------- ---------------------------------------------------------------------- +v0.0.1 - Initial version. +======= ====================================================================== + + +Developers +========== + +This script and related tools are being developed on this GitHub repository: +https://github.com/peterjc/pico_galaxy/tree/master/tools/sample_seqs + +For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use +the following command from the Galaxy root folder:: + + $ tar -czf sample_seqs.tar.gz tools/sample_seqs/README.rst tools/sample_seqs/sample_seqs.py tools/sample_seqs/sample_seqs.xml tools/sample_seqs/tool_dependencies.xml test-data/ecoli.fastq test-data/ecoli.sample_N100.fastq test-data/get_orf_input.Suis_ORF.prot.fasta test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta test-data/MID4_GLZRM4E04_rnd30_frclip.sff test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff + +Check this worked:: + + $ tar -tzf sample_seqs.tar.gz + tools/sample_seqs/README.rst + tools/sample_seqs/sample_seqs.py + tools/sample_seqs/sample_seqs.xml + tools/sample_seqs/tool_dependencies.xml + test-data/ecoli.fastq + test-data/ecoli.sample_N100.fastq + test-data/get_orf_input.Suis_ORF.prot.fasta + test-data/get_orf_input.Suis_ORF.prot.sample_N100.fasta + test-data/MID4_GLZRM4E04_rnd30_frclip.sff + test-data/MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff + + +Licence (MIT) +============= + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. |
b |
diff -r 000000000000 -r 3a807e5ea6c8 tools/sample_seqs/sample_seqs.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/sample_seqs/sample_seqs.py Thu Mar 27 09:40:53 2014 -0400 |
[ |
@@ -0,0 +1,183 @@ +#!/usr/bin/env python +"""Sub-sample sequence from a FASTA, FASTQ or SFF file. + +This tool is a short Python script which requires Biopython 1.62 or later +for SFF file support. If you use this tool in scientific work leading to a +publication, please cite the Biopython application note: + +Cock et al 2009. Biopython: freely available Python tools for computational +molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. +http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. + +This script is copyright 2010-2013 by Peter Cock, The James Hutton Institute +(formerly the Scottish Crop Research Institute, SCRI), UK. All rights reserved. +See accompanying text file for licence details (MIT license). + +This is version 0.1.0 of the script, use -v or --version to get the version. +""" +import os +import sys + +def stop_err(msg, err=1): + sys.stderr.write(msg.rstrip() + "\n") + sys.exit(err) + +if "-v" in sys.argv or "--version" in sys.argv: + print("v0.1.0") + sys.exit(0) + +#Parse Command Line +if len(sys.argv) < 5: + stop_err("Requires at least four arguments: seq_format, in_file, out_file, mode, ...") +seq_format, in_file, out_file, mode = sys.argv[1:5] +if in_file != "/dev/stdin" and not os.path.isfile(in_file): + stop_err("Missing input file %r" % in_file) + +if mode == "everyNth": + if len(sys.argv) != 6: + stop_err("If using everyNth, just need argument N (integer, at least 2)") + try: + N = int(sys.argv[5]) + except: + stop_err("Bad N argument %r" % sys.argv[5]) + if N < 2: + stop_err("Bad N argument %r" % sys.argv[5]) + if (N % 10) == 1: + sys.stderr.write("Sampling every %ist sequence\n" % N) + elif (N % 10) == 2: + sys.stderr.write("Sampling every %ind sequence\n" % N) + elif (N % 10) == 3: + sys.stderr.write("Sampling every %ird sequence\n" % N) + else: + sys.stderr.write("Sampling every %ith sequence\n" % N) + def sampler(iterator): + global N + count = 0 + for record in iterator: + count += 1 + if count % N == 1: + yield record +elif mode == "percentage": + if len(sys.argv) != 6: + stop_err("If using percentage, just need percentage argument (float, range 0 to 100)") + try: + percent = float(sys.argv[5]) / 100.0 + except: + stop_err("Bad percent argument %r" % sys.argv[5]) + if percent <= 0.0 or 1.0 <= percent: + stop_err("Bad percent argument %r" % sys.argv[5]) + sys.stderr.write("Sampling %0.3f%% of sequences\n" % (100.0 * percent)) + def sampler(iterator): + global percent + count = 0 + taken = 0 + for record in iterator: + count += 1 + if percent * count > taken: + taken += 1 + yield record +else: + stop_err("Unsupported mode %r" % mode) + +def raw_fasta_iterator(handle): + """Yields raw FASTA records as multi-line strings.""" + while True: + line = handle.readline() + if line == "": + return # Premature end of file, or just empty? + if line[0] == ">": + break + + no_id_warned = False + while True: + if line[0] != ">": + raise ValueError( + "Records in Fasta files should start with '>' character") + try: + id = line[1:].split(None, 1)[0] + except IndexError: + if not no_id_warned: + sys.stderr.write("WARNING - Malformed FASTA entry with no identifier\n") + no_id_warned = True + id = None + lines = [line] + line = handle.readline() + while True: + if not line: + break + if line[0] == ">": + break + lines.append(line) + line = handle.readline() + yield "".join(lines) + if not line: + return # StopIteration + +def fasta_filter(in_file, out_file, iterator_filter): + count = 0 + #Galaxy now requires Python 2.5+ so can use with statements, + with open(in_file) as in_handle: + with open(out_file, "w") as pos_handle: + for record in iterator_filter(raw_fasta_iterator(in_handle)): + count += 1 + pos_handle.write(record) + return count + +try: + from galaxy_utils.sequence.fastq import fastqReader, fastqWriter + def fastq_filter(in_file, out_file, iterator_filter): + count = 0 + #from galaxy_utils.sequence.fastq import fastqReader, fastqWriter + reader = fastqReader(open(in_file, "rU")) + writer = fastqWriter(open(out_file, "w")) + for record in iterator_filter(reader): + count += 1 + writer.write(record) + writer.close() + reader.close() + return count +except ImportError: + from Bio.SeqIO.QualityIO import FastqGeneralIterator + def fastq_filter(in_file, out_file, iterator_filter): + count = 0 + with open(in_file) as in_handle: + with open(out_file, "w") as pos_handle: + for title, seq, qual in iterator_filter(FastqGeneralIterator(in_handle)): + count += 1 + pos_handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual)) + return count + +def sff_filter(in_file, out_file, iterator_filter): + count = 0 + try: + from Bio.SeqIO.SffIO import SffIterator, SffWriter + except ImportError: + stop_err("SFF filtering requires Biopython 1.54 or later") + try: + from Bio.SeqIO.SffIO import ReadRocheXmlManifest + except ImportError: + #Prior to Biopython 1.56 this was a private function + from Bio.SeqIO.SffIO import _sff_read_roche_index_xml as ReadRocheXmlManifest + with open(in_file, "rb") as in_handle: + try: + manifest = ReadRocheXmlManifest(in_handle) + except ValueError: + manifest = None + in_handle.seek(0) + with open(out_file, "wb") as out_handle: + writer = SffWriter(out_handle, xml=manifest) + in_handle.seek(0) #start again after getting manifest + count = writer.write_file(iterator_filter(SffIterator(in_handle))) + #count = writer.write_file(SffIterator(in_handle)) + return count + +if seq_format.lower()=="sff": + count = sff_filter(in_file, out_file, sampler) +elif seq_format.lower()=="fasta": + count = fasta_filter(in_file, out_file, sampler) +elif seq_format.lower().startswith("fastq"): + count = fastq_filter(in_file, out_file, sampler) +else: + stop_err("Unsupported file type %r" % seq_format) + +sys.stderr.write("Sampled %i records\n" % count) |
b |
diff -r 000000000000 -r 3a807e5ea6c8 tools/sample_seqs/sample_seqs.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/sample_seqs/sample_seqs.xml Thu Mar 27 09:40:53 2014 -0400 |
b |
@@ -0,0 +1,119 @@ +<tool id="sample_seqs" name="Sub-sample sequences files" version="0.0.1"> + <description>e.g. to reduce coverage</description> + <requirements> + <requirement type="package" version="1.63">biopython</requirement> + <requirement type="python-module">Bio</requirement> + </requirements> + <version_command interpreter="python">sample_seqs.py --version</version_command> + <command interpreter="python"> +#if str($sampling.type) == "everyNth": +sample_seqs.py "$input_file.ext" "$input_file" "$output_file" "${sampling.type}" "${sampling.every_n}" +#elif str($sampling.type) == "percentage": +sample_seqs.py "$input_file.ext" "$input_file" "$output_file" "${sampling.type}" "${sampling.percent}" +#else: +##Should give an error about invalid sampling type: +sample_seqs.py "$input_file.ext" "$input_file" "$output_file" "${sampling.type}" +#end if + </command> + <stdio> + <!-- Anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + </stdio> + <inputs> + <param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file" help="FASTA, FASTQ, or SFF format." /> + <conditional name="sampling"> + <param name="type" type="select" label="Sub-sampling approach"> + <option value="everyNth">Take every N-th sequence (e.g. every fifth sequence)</option> + <option value="percentage">Take some percentage of the sequences (e.g. 20% will take every fifth sequence)</option> + <!-- TODO - target coverage etc --> + </param> + <when value="everyNth"> + <param name="every_n" value="5" type="integer" min="2" label="N" help="At least 2, e.g. 5 will take every 5th sequence (taking 20% of the sequences)" /> + </when> + <when value="percentage"> + <param name="percent" value="20.0" type="float" min="0" max="100" label="Percentage" help="Between 0 and 100, e.g. 20% will take every 5th sequence" /> + </when> + </conditional> + </inputs> + <outputs> + <data name="output_file" format="input" metadata_source="input_file" label="${input_file.name} (sub-sampled)"/> + </outputs> + <tests> + <test> + <param name="input_file" value="get_orf_input.Suis_ORF.prot.fasta" /> + <param name="type" value="everyNth" /> + <param name="every_n" value="100" /> + <output name="output_file" file="get_orf_input.Suis_ORF.prot.sample_N100.fasta" /> + </test> + <test> + <param name="input_file" value="ecoli.fastq" /> + <param name="type" value="everyNth" /> + <param name="every_n" value="100" /> + <output name="output_file" file="ecoli.sample_N100.fastq" /> + </test> + <test> + <param name="input_file" value="MID4_GLZRM4E04_rnd30_frclip.sff" ftype="sff" /> + <param name="type" value="everyNth" /> + <param name="every_n" value="5" /> + <output name="output_file" file="MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff" ftype="sff"/> + </test> + <test> + <param name="input_file" value="get_orf_input.Suis_ORF.prot.fasta" /> + <param name="type" value="percentage" /> + <param name="percent" value="1.0" /> + <output name="output_file" file="get_orf_input.Suis_ORF.prot.sample_N100.fasta" /> + </test> + <test> + <param name="input_file" value="ecoli.fastq" /> + <param name="type" value="percentage" /> + <param name="percent" value="1.0" /> + <output name="output_file" file="ecoli.sample_N100.fastq" /> + </test> + <test> + <param name="input_file" value="MID4_GLZRM4E04_rnd30_frclip.sff" ftype="sff" /> + <param name="type" value="percentage" /> + <param name="percent" value="20.0" /> + <output name="output_file" file="MID4_GLZRM4E04_rnd30_frclip.sample_N5.sff" ftype="sff"/> + </test> + </tests> + <help> +**What it does** + +Takes an input file of sequences (typically FASTA or FASTQ, but also +Standard Flowgram Format (SFF) is supported), and returns a new sequence +file sub-sampling from this (in the same format). + +Several sampling modes are supported, all designed to be non-random. This +allows reproducibility, and also works on paired sequence files. Also +note that by sampling uniformly through the file, this avoids any bias +should reads in any part of the file are of lesser quality (e.g. one part +of the slide). + +The simplest mode is to take every N-th sequence, for example taking +every 2nd sequence would sample half the file - while taking every 5th +sequence would take 20% of the file. + + +**Example Usage** + +Suppose you have some Illumina paired end data as files ``R1.fastq`` and +``R2.fastq`` which give an estimated x200 coverage, and you wish to do a +*de novo* assembly with a tool like MIRA which recommends lower coverage. +Taking every 3rd read would reduce the estimated coverage to about x66, +and would preserve the pairing as well. + + +**Citation** + +This tool uses Biopython, so if you use this Galaxy tool in work leading to a +scientific publication please cite the following paper: + +Cock et al (2009). Biopython: freely available Python tools for computational +molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. +http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. + +This tool is available to install into other Galaxy Instances via the Galaxy +Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/sample_seqs + </help> +</tool> |
b |
diff -r 000000000000 -r 3a807e5ea6c8 tools/sample_seqs/tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/sample_seqs/tool_dependencies.xml Thu Mar 27 09:40:53 2014 -0400 |
b |
@@ -0,0 +1,6 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="biopython" version="1.63"> + <repository changeset_revision="a5c49b83e983" name="package_biopython_1_63" owner="biopython" toolshed="http://toolshed.g2.bx.psu.edu" /> + </package> +</tool_dependency> |