comparison GEMBASSY-1.0.3/doc/text/gbaseinformationcontent.txt @ 0:8300eb051bea draft

Initial upload
author ktnyt
date Fri, 26 Jun 2015 05:19:29 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:8300eb051bea
1 gbaseinformationcontent
2 Function
3
4 Calculates and graphs the sequence conservation using information content
5
6 Description
7
8 This function calculates and graphs the sequence conservation in regions
9 around the start/stop codons using information content. Values are obtained
10 by subtracting the entropy for each positfion from the maximum possible
11 value (which will be 2 in the case of nucleotide sequences). Information
12 content will show the highest value when the frequency is most biased to a
13 single alphabet.
14
15 Information content I is obtained by subtracting the entropy H from the
16 maximum uncertainty log(2,|M|):
17 I(P(i)) = log(2,|M|) - (-sum(P(i,j) * log(2,P(i,j))))
18
19 G-language SOAP service is provided by the
20 Institute for Advanced Biosciences, Keio University.
21 The original web service is located at the following URL:
22
23 http://www.g-language.org/wiki/soap
24
25 WSDL(RPC/Encoded) file is located at:
26
27 http://soap.g-language.org/g-language.wsdl
28
29 Documentation on G-language Genome Analysis Environment methods are
30 provided at the Document Center
31
32 http://ws.g-language.org/gdoc/
33
34 Usage
35
36 Here is a sample session with gbaseinformationcontent
37
38 % gbaseinformationcontent refseqn:NC_000913
39 Calculates and graphs the sequence conservation using information content
40 Program compseq output file (optional) [nc_000913.gbaseinformationcontent]:
41
42 Go to the input files for this example
43 Go to the output files for this example
44
45 Example 2
46
47 % gbaseinformationcontent refseqn:NC_000913 -plot -graph png
48 Calculates and graphs the sequence conservation using information content
49 Created gbaseinformationcontent.1.png
50
51 Go to the input files for this example
52 Go to the output files for this example
53
54 Command line arguments
55
56 Standard (Mandatory) qualifiers (* if not always prompted):
57 [-sequence] seqall Nucleotide sequence(s) filename and optional
58 format, or reference (input USA)
59 * -graph xygraph [$EMBOSS_GRAPHICS value, or x11] Graph type
60 (ps, hpgl, hp7470, hp7580, meta, cps, x11,
61 tek, tekt, none, data, xterm, png, gif, svg)
62 * -outfile outfile [*.gbaseinformationcontent] Program compseq
63 output file (optional)
64
65 Additional (Optional) qualifiers: (none)
66 Advanced (Unprompted) qualifiers:
67 -position selection [start] Either 'start' (around start codon)
68 or 'end' (around stop codon) to create the
69 PWM
70 -upstream integer [30] Length upstream of specified position
71 to create PWM (Any integer value)
72 -downstream integer [30] Length downstream of specified position
73 to create PWM (Any integer value)
74 -patlen integer [3] Length of oligomer to count (Any integer
75 value)
76 -[no]accid boolean [Y] Include to use sequence accession ID as
77 query
78 -plot toggle [N] Include to plot result
79
80 Associated qualifiers:
81
82 "-sequence" associated qualifiers
83 -sbegin1 integer Start of each sequence to be used
84 -send1 integer End of each sequence to be used
85 -sreverse1 boolean Reverse (if DNA)
86 -sask1 boolean Ask for begin/end/reverse
87 -snucleotide1 boolean Sequence is nucleotide
88 -sprotein1 boolean Sequence is protein
89 -slower1 boolean Make lower case
90 -supper1 boolean Make upper case
91 -scircular1 boolean Sequence is circular
92 -sformat1 string Input sequence format
93 -iquery1 string Input query fields or ID list
94 -ioffset1 integer Input start position offset
95 -sdbname1 string Database name
96 -sid1 string Entryname
97 -ufo1 string UFO features
98 -fformat1 string Features format
99 -fopenfile1 string Features file name
100
101 "-graph" associated qualifiers
102 -gprompt boolean Graph prompting
103 -gdesc string Graph description
104 -gtitle string Graph title
105 -gsubtitle string Graph subtitle
106 -gxtitle string Graph x axis title
107 -gytitle string Graph y axis title
108 -goutfile string Output file for non interactive displays
109 -gdirectory string Output directory
110
111 "-outfile" associated qualifiers
112 -odirectory string Output directory
113
114 General qualifiers:
115 -auto boolean Turn off prompts
116 -stdout boolean Write first file to standard output
117 -filter boolean Read first file from standard input, write
118 first file to standard output
119 -options boolean Prompt for standard and additional values
120 -debug boolean Write debug output to program.dbg
121 -verbose boolean Report some/full command line options
122 -help boolean Report command line options and exit. More
123 information on associated and general
124 qualifiers can be found with -help -verbose
125 -warning boolean Report warnings
126 -error boolean Report errors
127 -fatal boolean Report fatal errors
128 -die boolean Report dying program messages
129 -version boolean Report version number and exit
130
131 Input file format
132
133 The database definitions for following commands are available at
134 http://soap.g-language.org/kbws/embossrc
135
136 gbaseinformationcontent reads one or more nucleotide sequences.
137
138 Output file format
139
140 The output from gbaseinformationcontent is to a plain text file or the
141 EMBOSS graphics device.
142
143 File: nc_000913.gbaseinformationcontent
144
145 Sequence: NC_000913
146 -30,2.42457
147 -29,2.42811
148 -28,2.43235
149 -27,2.43116
150 -26,2.44278
151 -25,2.44236
152 -24,2.44502
153 -23,2.46097
154 -22,2.46588
155
156 [Part of this file has been deleted for brevity]
157
158 21,2.27547
159 22,2.46974
160 23,2.46342
161 24,2.32686
162 25,2.46245
163 26,2.46061
164 27,2.27664
165 28,2.45650
166 29,2.48206
167 30,2.29140
168
169
170 Data files
171
172 None.
173
174 Notes
175
176 None.
177
178 References
179
180 Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
181 Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
182 for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.
183
184 Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
185 large-scale analysis of high-throughput omics data, J. Pest Sci.,
186 31, 7.
187
188 Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
189 Analysis Environment with REST and SOAP Web Service Interfaces,
190 Nucleic Acids Res., 38, W700-W705.
191
192 Warnings
193
194 None.
195
196 Diagnostic Error Messages
197
198 None.
199
200 Exit status
201
202 It always exits with a status of 0.
203
204 Known bugs
205
206 None.
207
208 See also
209
210 gbaseentropy Calculates and graphs the sequence conservation
211 using Shanon uncertainty (entropy)
212 gbaserelativeentropy Calculates and graphs the sequence conservation
213 using Kullback-Leibler divergence (relative
214 entropy)
215
216 Author(s)
217
218 Hidetoshi Itaya (celery@g-language.org)
219 Institute for Advanced Biosciences, Keio University
220 252-0882 Japan
221
222 Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
223 Institute for Advanced Biosciences, Keio University
224 252-0882 Japan
225
226 History
227
228 2012 - Written by Hidetoshi Itaya
229 2013 - Fixed by Hidetoshi Itaya
230
231 Target users
232
233 This program is intended to be used by everyone and everything, from
234 naive users to embedded scripts.
235
236 Comments
237
238 None.
239