comparison tools/protein_analysis/rxlr_motifs.xml @ 6:a290c6d4e658

Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
author peterjc
date Tue, 07 Jun 2011 18:07:09 -0400
parents
children 9b45a8743100
comparison
equal deleted inserted replaced
5:0f1c61998b22 6:a290c6d4e658
1 <tool id="rxlr_motifs" name="RXLR Motifs" version="0.0.5">
2 <description>Find RXLR Effectors of Plant Pathogenic Oomycetes</description>
3 <command interpreter="python">
4 rxlr_motifs.py $fasta_file 8 $model $tabular_file
5 ##I want the number of threads to be a Galaxy config option...
6 </command>
7 <inputs>
8 <param name="fasta_file" type="data" format="fasta" label="FASTA file of protein sequences" />
9 <param name="model" type="select" label="Which RXLR model?">
10 <option value="Bhattacharjee2006">Bhattacharjee et al. (2006) RXLR</option>
11 <option value="Win2007">Win et al. (2007) RXLR</option>
12 <option value="Whisson2007" selected="True">Whisson et al. (2007) RXLR-EER with HMM</option>
13 </param>
14 </inputs>
15 <outputs>
16 <data name="tabular_file" format="tabular" label="$model.value_label" />
17 </outputs>
18 <requirements>
19 <!-- Need SignalP for all the models -->
20 <requirement type="binary">signalp</requirement>
21 <!-- Need HMMER for Whisson et al. (2007) -->
22 <requirement type="binary">hmmsearch</requirement>
23 </requirements>
24 <tests>
25 <test>
26 <param name="fasta_file" value="rxlr_win_et_al_2007.fasta" ftype="fasta" />
27 <param name="model" value="Win2007" />
28 <output name="tabular_file" file="rxlr_win_et_al_2007.tabular" ftype="tabular" />
29 </test>
30 </tests>
31 <help>
32
33 **Background**
34
35 Many effector proteins from Oomycete plant pathogens for manipulating the host
36 have been found to contain a signal peptide followed by a conserved RXLR motif
37 (Arg, any amino acid, Leu, Arg), and then sometimes EER (Glu, Glu, Arg). There
38 are stiking parallels with the malarial host-targeting signal (Plasmodium
39 export element, or "Pexel" for short).
40
41 -----
42
43 **What it does**
44
45 Takes a protein sequence FASTA file as input, and produces a simple tabular
46 file as output with one line per protein, and two columns giving the sequence
47 ID and the predicted class. This is typically just whether or not it had the
48 selected RXLR motif (Y or N).
49
50 -----
51
52 **Bhattacharjee et al. (2006) RXLR Model**
53
54 Looks for the oomycete motif RXLR as described in Bhattacharjee et al. (2006).
55
56 Matches must have a SignalP Hidden Markov Model (HMM) score of at least 0.9,
57 a SignalP Neural Network (NN) predicted clevage site giving a signal peptide
58 length between 10 and 40 amino acids inclusive, and the RXLR pattern must be
59 after but within 100 amino acids of the clevage site.
60 SignalP is run truncating the sequences to the first 70 amino acids, which was
61 the default on the SignalP webservice used in Bhattacharjee et al. (2006).
62
63
64 **Win et al. (2007) RXLR Model**
65
66 Looks for the protein motif RXLR as described in Win et al. (2007).
67
68 Matches must have a SignalP Hidden Markov Model (HMM) score of at least 0.9,
69 a SignalP Neural Network (NN) predicted clevage site giving a signal peptide
70 length between 10 and 40 amino acids inclusive, and the RXLR pattern must be
71 after the clevage site and start between amino acids 30 and 60.
72 SignalP is run truncating the sequences to the first 70 amino acids, to match
73 the methodology of Torto et al. (2003) followed in Win et al. (2007).
74
75
76 **Whisson et al. (2007) RXLR-EER with HMM**
77
78 Looks for the protein motif RXLR-EER using the heuristic regular expression
79 methodolgy, which was an extension of the Bhattacharjee et al. (2006) model,
80 and a HMM as described in Whisson et al. (2007).
81
82 All the requirements described above for Bhattacharjee et al. (2006) apply,
83 but rather than just looking for RXLR with the regular expression R.LR the
84 more complicated regular expression R.LR.{,40}[ED][ED][KR] is used. This means
85 RXLR (Arg, any amino acid, Leu, Arg), then a stretch of up to forty amino
86 acids before Glu/Asp, Glu/Asp, Lys/Arg. The EER part of the name is perhaps
87 misleading as it also allows for DDR, EEK, and so on.
88
89 Unlike Bhattacharjee et al. (2006) which used the SignalP webservice which
90 defaults to truncating the sequences at 70 amino acids, Whisson et al. (2007)
91 used the SignalP 3.0 command line tool with its default of not truncating the
92 sequences. This does alter some of the scores, and also takes a little longer.
93
94 Additionally HMMER 2.3.2 is run to look for a cross validated HMM for the
95 RXLR-ERR domain based on known positive examples. There are no restrictions
96 on where within the protein the HMM match must be found.
97
98 The output of this model has four classes:
99 * Y = Yes, both the heuristic motif and HMM were found.
100 * re = Only the heuristic SignalP with regular expression motif was found.
101 * hmm = Only the HMM was found.
102 * neither = Niether the heuristic motif nor HMM was found.
103
104 -----
105
106 **Note**
107
108 Both Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0, which
109 is no longer available. The current release is SignalP v3.0 (Mar 5, 2007), so
110 this is used instead. SignalP is called with the Eukaryote model and the short
111 output (one line per protein). Any sequence truncation (e.g. to 70 amino acids)
112 is handled via the intemediate sequence files.
113
114 -----
115
116 **References**
117
118 Stephen C. Whisson, Petra C. Boevink, Lucy Moleleki, Anna O. Avrova, Juan G. Morales, Eleanor M. Gilroy, Miles R. Armstrong, Severine Grouffaud, Pieter van West, Sean Chapman, Ingo Hein, Ian K. Toth, Leighton Pritchard and Paul R. J. Birch
119 A translocation signal for delivery of oomycete effector proteins into host plant cells.
120 Nature 450:115-118, 2007.
121 http://dx.doi.org/10.1038/nature06203
122
123 Joe Win, William Morgan, Jorunn Bos, Ksenia V. Krasileva, Liliana M. Cano, Angela Chaparro-Garcia, Randa Ammar, Brian J. Staskawicz and Sophien Kamoun.
124 Adaptive evolution has targeted the C-terminal domain of the RXLR effectors of plant pathogenic oomycetes.
125 The Plant Cell 19:2349-2369, 2007.
126 http://dx.doi.org/10.1105/tpc.107.051037
127
128 Souvik Bhattacharjee, N. Luisa Hiller, Konstantinos Liolios, Joe Win, Thirumala-Devi Kanneganti, Carolyn Young, Sophien Kamoun and Kasturi Haldar.
129 The malarial host-targeting signal is conserved in the Irish potato famine pathogen.
130 PLoS Pathogens, 2(5):e50, 2006.
131 http://dx.doi.org/10.1371/journal.ppat.0020050
132
133 Trudy A. Torto, Shuang Li, Allison Styer, Edgar Huitema, Antonino Testa, Neil A.R. Gow, Pieter van West and Sophien Kamoun.
134 EST mining and functional expression assays identify extracellular effector proteins from the plant pathogen *phytophthora*.
135 Genome Research, 13:1675-1685, 2003.
136 http://dx.doi.org/10.1101/gr.910003
137
138 Sean R. Eddy.
139 Profile hidden Markov models.
140 Bioinformatics, 14(9):755–763, 1998
141 http://dx.doi.org/10.1093/bioinformatics/14.9.755
142
143 Nielsen, Engelbrecht, Brunak and von Heijne.
144 Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.
145 Protein Engineering, 10:1-6, 1997.
146 http://dx.doi.org/10.1093/protein/10.1.1
147
148 </help>
149 </tool>