comparison tools/protein_analysis/tmhmm2.xml @ 11:99b82a2b1272 draft

Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
author peterjc
date Wed, 03 Apr 2013 10:49:10 -0400
parents e52220a9ddad
children dc958c2a963a
comparison
equal deleted inserted replaced
10:09ff180d1615 11:99b82a2b1272
1 <tool id="tmhmm2" name="TMHMM 2.0" version="0.0.9"> 1 <tool id="tmhmm2" name="TMHMM 2.0" version="0.0.10">
2 <description>Find transmembrane domains in protein sequences</description> 2 <description>Find transmembrane domains in protein sequences</description>
3 <!-- If job splitting is enabled, break up the query file into parts --> 3 <!-- If job splitting is enabled, break up the query file into parts -->
4 <!-- Using 2000 chunks meaning 4 threads doing 500 each is ideal --> 4 <!-- Using 2000 chunks meaning 4 threads doing 500 each is ideal -->
5 <parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism> 5 <parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism>
6 <command interpreter="python"> 6 <command interpreter="python">
45 45
46 This calls the TMHMM v2.0 tool for prediction of transmembrane (TM) helices in proteins using a hidden Markov model (HMM). 46 This calls the TMHMM v2.0 tool for prediction of transmembrane (TM) helices in proteins using a hidden Markov model (HMM).
47 47
48 The input is a FASTA file of protein sequences, and the output is tabular with six columns (one row per protein): 48 The input is a FASTA file of protein sequences, and the output is tabular with six columns (one row per protein):
49 49
50 1. Sequence identifier 50 ====== =====================================================================================
51 2. Sequence length 51 Column Description
52 3. Expected number of amino acids in TM helices (ExpAA). If this number is larger than 18 it is very likely to be a transmembrane protein (OR have a signal peptide). 52 ------ -------------------------------------------------------------------------------------
53 4. Expected number of amino acids in TM helices in the first 60 amino acids of the protein (Exp60). If this number more than a few, be aware that a predicted transmembrane helix in the N-term could be a signal peptide. 53 1 Sequence identifier
54 5. Number of transmembrane helices predicted by N-best. 54 2 Sequence length
55 6. Topology predicted by N-best (encoded as a strip using o for output and i for inside) 55 3 Expected number of amino acids in TM helices (ExpAA). If this number is larger than
56 18 it is very likely to be a transmembrane protein (OR have a signal peptide).
57 4 Expected number of amino acids in TM helices in the first 60 amino acids of the
58 protein (Exp60). If this number more than a few, be aware that a predicted
59 transmembrane helix in the N-term could be a signal peptide.
60 5 Number of transmembrane helices predicted by N-best.
61 6 Topology predicted by N-best (encoded as a strip using o for output and i for inside)
62 ====== =====================================================================================
56 63
57 Predicted TM segments in the n-terminal region sometimes turn out to be signal peptides. 64 Predicted TM segments in the n-terminal region sometimes turn out to be signal peptides.
58 65
59 One of the most common mistakes by the program is to reverse the direction of proteins with one TM segment (i.e. mixing up which end of the protein is outside and inside the membrane). 66 One of the most common mistakes by the program is to reverse the direction of proteins with one TM segment (i.e. mixing up which end of the protein is outside and inside the membrane).
60 67
61 Do not use the program to predict whether a non-membrane protein is cytoplasmic or not. 68 Do not use the program to predict whether a non-membrane protein is cytoplasmic or not.
69
62 70
63 **Notes** 71 **Notes**
64 72
65 The short format output from TMHMM v2.0 looks like this (six columns tab separated, shown here as a table): 73 The short format output from TMHMM v2.0 looks like this (six columns tab separated, shown here as a table):
66 74
79 gi|4959044|gb|AAD34209.1|AF069992_1 600 0.00 0.00 0 o 87 gi|4959044|gb|AAD34209.1|AF069992_1 600 0.00 0.00 0 o
80 gi|671626|emb|CAA85685.1| 473 0.19 0.00 0 o 88 gi|671626|emb|CAA85685.1| 473 0.19 0.00 0 o
81 gi|3298468|dbj|BAA31520.1| 107 59.37 31.17 3 o23-45i52-74o89-106i 89 gi|3298468|dbj|BAA31520.1| 107 59.37 31.17 3 o23-45i52-74o89-106i
82 =================================== === ===== ======= ======= ==================== 90 =================================== === ===== ======= ======= ====================
83 91
92
84 **References** 93 **References**
85 94
86 Krogh, Larsson, von Heijne, and Sonnhammer. 95 Krogh, Larsson, von Heijne, and Sonnhammer.
87 Predicting Transmembrane Protein Topology with a Hidden Markov Model: Application to Complete Genomes. 96 Predicting Transmembrane Protein Topology with a Hidden Markov Model: Application to Complete Genomes.
88 J. Mol. Biol. 305:567-580, 2001. 97 J. Mol. Biol. 305:567-580, 2001.