What it does
This is the main program that makes gene preditions based on an interpolated context model (ICM).
The ICM can be generated with extracted CDS from related organisms (ICM builder). If you can't generate an ICM model you can use the non knowlegde-based Glimmer with a de novo prediction.
Example
Input:
- interpolated context model (ICM): Use the 'Glimmer ICM builder' tool to create one - Genome Sequence in FASTA format >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7 GATCCTTGTAGATTTTGAATTTGAAGTTTTTTCTCATTCCAAAACTCTGT GATCTGAAATAAAATGTCTCAAAAAAATAGAAGAAAACATTGCTTTATAT TTATCAGTTATGGTTTTCAAAATTTTCTGACATACCGTTTTGCTTCTTTT TTTCTCATCTTCTTCAAATATCAATTGTGATAATCTGACTCCTAACAATC GAATTTCTTTTCCTTTTTCTTTTTCCAACAACTCCAGTGAGAACTTTTGA ATATCTTCAAGTGACTTCACCACATCAGAAGGTGTCAACGATCTTGTGAG AACATCGAATGAAGATAATTTTAATTTTAGAGTTACAGTTTTTCCTCCGA CAATTCCTGATTTACGAACATCTTCTTCAAGCATTCTACAGATTTCTTGA TGCTCTTCTAGGAGGATGTTGAAATCCGAAGTTGGAGAAAAAGTTCTCTC AACTGAAATGCTTTTTCTTCGTGGATCCGATTCAGATGGACGACCTGGCA GTCCGAGAGCCGTTCGAAGGAAAGATTCTTGTGAGAGAGGCGTGAAACAC AAAGGGTATAGGTTCTTCTTCAGATTCATATCACCAACAGTTTGAATATC CATTGCTTTCAGTTGAGCTTCGCATACACGACCAATTCCTCCAACCTAAA AAATTATCTAGGTAAAACTAGAAGGTTATGCTTTAATAGTCTCACCTTAC GAATCGGTAAATCCTTCAAAAACTCCATAATCGCGTTTTTATCATTTTCT .....
Output:
- FASTA file with predicted proteins - Glimmer prediction file (optional) >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7. orf00001 40137 52 +2 8.68 orf00004 603 34 -1 2.91 orf00006 1289 1095 -3 3.16 orf00007 1555 1391 -2 2.33 orf00008 1809 1576 -1 1.02 orf00010 1953 2066 +3 3.09 orf00011 2182 2304 +1 0.89 orf00013 2390 2521 +2 0.60 orf00018 2570 3073 +2 2.54 orf00020 3196 3747 +1 2.91 orf00022 3758 4000 +2 0.83 orf00023 4399 4157 -2 1.31 orf00025 4463 4759 +2 2.92 orf00026 4878 5111 +3 0.78 orf00027 5468 5166 -3 1.64 orf00029 5590 5832 +1 0.29 orf00032 6023 6226 +2 6.02 orf00033 6217 6336 +1 3.09 ........ - Glimmer detailed report (optional) >CELF22B7 C.aenorhabditis elegans (Bristol N2) cosmid F22B7. Sequence length = 40222 ----- Start ----- --- Length ---- ------------- Scores ------------- ID Frame of Orf of Gene Stop of Orf of Gene Raw InFrm F1 F2 F3 R1 R2 R3 NC 0001 +2 40137 40137 52 135 135 9.26 96 - 96 - - 3 - 0 0002 +1 58 64 180 120 114 5.01 69 69 - - 30 - - 0 +3 300 309 422 120 111 -0.68 20 - - 20 38 - - 41 +3 423 432 545 120 111 1.29 21 - 51 21 13 - 8 5 0003 +2 401 416 595 192 177 2.51 93 - 93 - 5 - - 1 0004 -1 645 552 34 609 516 2.33 99 - - - 99 - - 0 +1 562 592 762 198 168 -2.54 1 1 - - - - - 98 +1 763 772 915 150 141 -1.34 1 1 - - - - 86 11 +3 837 846 1007 168 159 1.35 28 - 50 28 - - 17 3 0005 -3 1073 977 654 417 321 0.52 84 - - - - - 84 15 0006 -3 1373 1319 1095 276 222 3.80 99 - - - - - 99 0 0007 -2 1585 1555 1391 192 162 2.70 98 - - - - 98 - 1 0008 -1 1812 1809 1576 234 231 1.26 94 - - - 94 - - 5 0009 +2 1721 1730 1945 222 213 0.68 80 - 80 - - - - 19 .....
References
A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics (Advance online version) (2007).