comparison GEMBASSY-1.0.3/doc/text/ggcsi.txt @ 0:8300eb051bea draft

Initial upload
author ktnyt
date Fri, 26 Jun 2015 05:19:29 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:8300eb051bea
1 ggcsi
2 Function
3
4 GC Skew Index: an index for strand-specific mutational bias
5
6 Description
7
8 ggcsi calculates the GC Skew Index (GCSI) of the given circular bacterial
9 genome. GCSI quantifies the degree of GC Skew. In other words, this index
10 represents the degree of strand-specific mutational bias in bacterial
11 genomes, caused by replicational selection.
12 GCSI is calculated by the following formula:
13
14 GCSI = sqrt((SA/6000) * (dist/600))
15
16 where SA is the spectral amplitude of Fourier power spectrum at 1Hz,
17 and dist is the normalized Euclidean distance between the vertices of
18 cumulative GC skew.
19
20 GCSI ranges from 0 (no observable skew) to 1 (strong skew), and Archaeal
21 genomes that have multiple replication origins and therefore have no
22 observable skew mostly have GCSI below 0.05. Escherichia coli genome has
23 values around 0.10.
24
25 Version 1 of GCSI required fixed number of windows (4096), but the new GCSI
26 version 2 (also known as generalized GCSI: gGCSI) is invariant of the number
27 of windows. GCSI version 1 is calculated as an arithmetic mean (as opposed
28 to the geometric mean of gGCSI) of SR (spectral ratio, the signal-to-noise
29 ratio of 1Hz power spectrum) and dist.
30
31 G-language SOAP service is provided by the
32 Institute for Advanced Biosciences, Keio University.
33 The original web service is located at the following URL:
34
35 http://www.g-language.org/wiki/soap
36
37 WSDL(RPC/Encoded) file is located at:
38
39 http://soap.g-language.org/g-language.wsdl
40
41 Documentation on G-language Genome Analysis Environment methods are
42 provided at the Document Center
43
44 http://ws.g-language.org/gdoc/
45
46 Usage
47
48 Here is a sample session with ggcsi
49
50 % ggcsi refseqn:NC_000913
51 GC Skew Index: an index for strand-specific mutational bias
52 Program compseq output file [nc_000913.ggcsi]:
53
54 Go to the input files for this example
55 Go to the output files for this example
56
57 Command line arguments
58
59 Standard (Mandatory) qualifiers:
60 [-sequence] seqall Nucleotide sequence(s) filename and optional
61 format, or reference (input USA)
62 [-outfile] outfile [*.ggcsi] Program compseq output file
63
64 Additional (Optional) qualifiers: (none)
65 Advanced (Unprompted) qualifiers:
66 -gcsi selection [2] GCSI version to use
67 -window integer [4096] Number of windows. Must be a power of
68 2 (Any integer value)
69 -purine boolean [N] Use purine skew for calculation
70 -keto boolean [N] Use keto skew for calculation
71 -at boolean [N] Use AT skew for calculation
72 -pval boolean [N] Calculate p-value when GCSI version 2 is
73 selected
74 -[no]accid boolean [Y] Include to use sequence accession ID as
75 query
76
77 Associated qualifiers:
78
79 "-sequence" associated qualifiers
80 -sbegin1 integer Start of each sequence to be used
81 -send1 integer End of each sequence to be used
82 -sreverse1 boolean Reverse (if DNA)
83 -sask1 boolean Ask for begin/end/reverse
84 -snucleotide1 boolean Sequence is nucleotide
85 -sprotein1 boolean Sequence is protein
86 -slower1 boolean Make lower case
87 -supper1 boolean Make upper case
88 -scircular1 boolean Sequence is circular
89 -sformat1 string Input sequence format
90 -iquery1 string Input query fields or ID list
91 -ioffset1 integer Input start position offset
92 -sdbname1 string Database name
93 -sid1 string Entryname
94 -ufo1 string UFO features
95 -fformat1 string Features format
96 -fopenfile1 string Features file name
97
98 "-outfile" associated qualifiers
99 -odirectory2 string Output directory
100
101 General qualifiers:
102 -auto boolean Turn off prompts
103 -stdout boolean Write first file to standard output
104 -filter boolean Read first file from standard input, write
105 first file to standard output
106 -options boolean Prompt for standard and additional values
107 -debug boolean Write debug output to program.dbg
108 -verbose boolean Report some/full command line options
109 -help boolean Report command line options and exit. More
110 information on associated and general
111 qualifiers can be found with -help -verbose
112 -warning boolean Report warnings
113 -error boolean Report errors
114 -fatal boolean Report fatal errors
115 -die boolean Report dying program messages
116 -version boolean Report version number and exit
117
118 Input file format
119
120 The database definitions for following commands are available at
121 http://soap.g-language.org/kbws/embossrc
122
123 ggcsi reads one or more nucleotide sequences.
124
125 Output file format
126
127 The output from ggcsi is to a plain text file.
128
129 File: nc_000913.ggcsi
130
131 Sequence: NC_000913 GCSI: 0.0966615833014818 SA: 487.218569030757 DIST: 69.037726
132
133
134 Data files
135
136 None.
137
138 Notes
139
140 None.
141
142 References
143
144 Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
145 Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
146 for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.
147
148 Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
149 large-scale analysis of high-throughput omics data, J. Pest Sci.,
150 31, 7.
151
152 Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
153 Analysis Environment with REST and SOAP Web Service Interfaces,
154 Nucleic Acids Res., 38, W700-W705.
155
156 Warnings
157
158 None.
159
160 Diagnostic Error Messages
161
162 None.
163
164 Exit status
165
166 It always exits with a status of 0.
167
168 Known bugs
169
170 None.
171
172 See also
173
174 gb1 Calculate strand bias of bacterial genome using B1 index
175 gb2 Calculate strand bias of bacterial genome using B2 index
176 gdeltagcskew Calculate strand bias of bacterial genome using delta GC skew
177 index
178 gldabias Calculate strand bias of bacterial genome using linear
179 discriminant analysis (LDA)
180
181 Author(s)
182
183 Hidetoshi Itaya (celery@g-language.org)
184 Institute for Advanced Biosciences, Keio University
185 252-0882 Japan
186
187 Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
188 Institute for Advanced Biosciences, Keio University
189 252-0882 Japan
190
191 History
192
193 2012 - Written by Hidetoshi Itaya
194 2013 - Fixed by Hidetoshi Itaya
195
196 Target users
197
198 This program is intended to be used by everyone and everything, from
199 naive users to embedded scripts.
200
201 Comments
202
203 None.
204