comparison GEMBASSY-1.0.3/doc/text/gphx.txt @ 2:8947fca5f715 draft default tip

Uploaded
author ktnyt
date Fri, 26 Jun 2015 05:21:44 -0400
parents 84a17b3fad1f
children
comparison
equal deleted inserted replaced
1:84a17b3fad1f 2:8947fca5f715
1 gphx
2 Function
3
4 Identify predicted highly expressed gene
5
6 Description
7
8 gphx calculates codon usage differences between gene classes for identifying
9 Predicted Highly eXpressed (PHX) and Putative Alien (PA) genes. A gene is
10 identified as PHX if BgC/BgH >= 1, where BgC and BgH is a value < 1 by it's
11 nature. PHX genes are known to generally have favorable codon usage, strong
12 SD sequences, and probably stronger conservation of promoter sequences.
13 A gene is idenfitied as PA if BgC and BgH is greater than the median of
14 BgC for every gene with a length close to the gene.
15
16 G-language SOAP service is provided by the
17 Institute for Advanced Biosciences, Keio University.
18 The original web service is located at the following URL:
19
20 http://www.g-language.org/wiki/soap
21
22 WSDL(RPC/Encoded) file is located at:
23
24 http://soap.g-language.org/g-language.wsdl
25
26 Documentation on G-language Genome Analysis Environment methods are
27 provided at the Document Center
28
29 http://ws.g-language.org/gdoc/
30
31 Usage
32
33 Here is a sample session with gphx
34
35 % gphx refseqn:NC_000913
36 Identify predicted highly expressed gene
37 Codon usage output file [nc_000913.gphx]:
38
39 Go to the input files for this example
40 Go to the output files for this example
41
42 Command line arguments
43
44 Standard (Mandatory) qualifiers:
45 [-sequence] seqall Nucleotide sequence(s) filename and optional
46 format, or reference (input USA)
47 [-outfile] outfile [*.gphx] Codon usage output file
48
49 Additional (Optional) qualifiers: (none)
50 Advanced (Unprompted) qualifiers:
51 -translate boolean [N] Include when translating using standard
52 codon table
53 -delkey string [[^ACDEFGHIKLMNPQRSTVWYacgtU]] Regular
54 expression to delete key (Any string)
55 -[no]accid boolean [Y] Include to use sequence accession ID as
56 query
57
58 Associated qualifiers:
59
60 "-sequence" associated qualifiers
61 -sbegin1 integer Start of each sequence to be used
62 -send1 integer End of each sequence to be used
63 -sreverse1 boolean Reverse (if DNA)
64 -sask1 boolean Ask for begin/end/reverse
65 -snucleotide1 boolean Sequence is nucleotide
66 -sprotein1 boolean Sequence is protein
67 -slower1 boolean Make lower case
68 -supper1 boolean Make upper case
69 -scircular1 boolean Sequence is circular
70 -sformat1 string Input sequence format
71 -iquery1 string Input query fields or ID list
72 -ioffset1 integer Input start position offset
73 -sdbname1 string Database name
74 -sid1 string Entryname
75 -ufo1 string UFO features
76 -fformat1 string Features format
77 -fopenfile1 string Features file name
78
79 "-outfile" associated qualifiers
80 -odirectory2 string Output directory
81
82 General qualifiers:
83 -auto boolean Turn off prompts
84 -stdout boolean Write first file to standard output
85 -filter boolean Read first file from standard input, write
86 first file to standard output
87 -options boolean Prompt for standard and additional values
88 -debug boolean Write debug output to program.dbg
89 -verbose boolean Report some/full command line options
90 -help boolean Report command line options and exit. More
91 information on associated and general
92 qualifiers can be found with -help -verbose
93 -warning boolean Report warnings
94 -error boolean Report errors
95 -fatal boolean Report fatal errors
96 -die boolean Report dying program messages
97 -version boolean Report version number and exit
98
99 Input file format
100
101 The database definitions for following commands are available at
102 http://soap.g-language.org/kbws/embossrc
103
104 gphx reads one or more nucleotide sequences.
105
106 Output file format
107
108 The output from gphx is to a plain text file.
109
110 File: nc_000913.gphx
111
112 Sequence: NC_000913
113 BgC,BgH,E_g,phx,pa,gene
114 0.8070,0.8977,0.8990,0,1,thrL
115 0.1857,0.5958,0.3116,0,0,thrA
116 0.2323,0.5964,0.3896,0,0,thrB
117 0.2353,0.6064,0.3881,0,0,thrC
118 0.4353,0.6020,0.7231,0,1,yaaX
119 0.2961,0.6790,0.4361,0,0,yaaA
120 0.2233,0.7009,0.3186,0,0,yaaJ
121 0.4149,0.3071,1.3511,1,0,talB
122
123 [Part of this file has been deleted for brevity]
124
125 0.3255,0.7038,0.4625,0,0,yjjX
126 0.3531,0.5906,0.5979,0,0,ytjC
127 0.2257,0.5235,0.4311,0,0,rob
128 0.3584,0.6809,0.5264,0,0,creA
129 0.3455,0.7950,0.4346,0,0,creB
130 0.2298,0.7154,0.3212,0,0,creC
131 0.3299,0.7916,0.4167,0,0,creD
132 0.3543,0.3786,0.9357,0,0,arcA
133 0.7295,0.8286,0.8804,0,1,yjjY
134 0.4028,0.8401,0.4795,0,0,yjtD
135
136
137 Data files
138
139 None.
140
141 Notes
142
143 None.
144
145 References
146
147 CMBL- PHX/PA user guide http://www.cmbl.uga.edu/software/PHX-PA-guide.htm
148
149 Henry I., Sharp PM. (2007) Predicting gene expression level from codon
150 usage bias Mol Biol Evol, 24(1):10-2.
151
152 Karlin S., and Mrazek J. (2000) Predicted highly expressed genes of diverse
153 prokaryotic genomes J.Bacteriol, 182(18):5238-5250.
154
155 Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
156 Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
157 for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.
158
159 Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
160 large-scale analysis of high-throughput omics data, J. Pest Sci.,
161 31, 7.
162
163 Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
164 Analysis Environment with REST and SOAP Web Service Interfaces,
165 Nucleic Acids Res., 38, W700-W705.
166
167 Warnings
168
169 None.
170
171 Diagnostic Error Messages
172
173 None.
174
175 Exit status
176
177 It always exits with a status of 0.
178
179 Known bugs
180
181 None.
182
183 See also
184
185 gcai Calculate codon adaptation index for each gene
186 gp2 Calculate the P2 index of each gene
187
188 Author(s)
189
190 Hidetoshi Itaya (celery@g-language.org)
191 Institute for Advanced Biosciences, Keio University
192 252-0882 Japan
193
194 Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
195 Institute for Advanced Biosciences, Keio University
196 252-0882 Japan
197
198 History
199
200 2012 - Written by Hidetoshi Itaya
201 2013 - Fixed by Hidetoshi Itaya
202
203 Target users
204
205 This program is intended to be used by everyone and everything, from
206 naive users to embedded scripts.
207
208 Comments
209
210 None.
211