ggcsi

 

Function

GC Skew Index: an index for strand-specific mutational bias

Description

ggcsi calculates the GC Skew Index (GCSI) of the given circular bacterial
genome. GCSI quantifies the degree of GC Skew. In other words, this index
represents the degree of strand-specific mutational bias in bacterial
genomes, caused by replicational selection.
GCSI is calculated by the following formula:

GCSI = sqrt((SA/6000) * (dist/600))

where SA is the spectral amplitude of Fourier power spectrum at 1Hz,
and dist is the normalized Euclidean distance between the vertices of
cumulative GC skew.

GCSI ranges from 0 (no observable skew) to 1 (strong skew), and Archaeal
genomes that have multiple replication origins and therefore have no
observable skew mostly have GCSI below 0.05. Escherichia coli genome has
values around 0.10.

Version 1 of GCSI required fixed number of windows (4096), but the new GCSI
version 2 (also known as generalized GCSI: gGCSI) is invariant of the number
of windows. GCSI version 1 is calculated as an arithmetic mean (as opposed
to the geometric mean of gGCSI) of SR (spectral ratio, the signal-to-noise
ratio of 1Hz power spectrum) and dist.

G-language SOAP service is provided by the
Institute for Advanced Biosciences, Keio University.
The original web service is located at the following URL:

http://www.g-language.org/wiki/soap

WSDL(RPC/Encoded) file is located at:

http://soap.g-language.org/g-language.wsdl

Documentation on G-language Genome Analysis Environment methods are
provided at the Document Center

http://ws.g-language.org/gdoc/

Usage

Here is a sample session with ggcsi

% ggcsi refseqn:NC_000913
GC Skew Index: an index for strand-specific mutational bias
Program compseq output file [nc_000913.ggcsi]: 

Go to the input files for this example
Go to the output files for this example

Command line arguments

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-sequence]
(Parameter 1)
seqall Nucleotide sequence(s) filename and optional format, or reference (input USA) Readable sequence(s) Required
[-outfile]
(Parameter 2)
outfile Program compseq output file Output file <*>.ggcsi
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
-gcsi selection GCSI version to use Choose from selection list of values 2
-window integer Number of windows. Must be a power of 2 Any integer value 4096
-purine boolean Use purine skew for calculation Boolean value Yes/No No
-keto boolean Use keto skew for calculation Boolean value Yes/No No
-at boolean Use AT skew for calculation Boolean value Yes/No No
-pval boolean Calculate p-value when GCSI version 2 is selected Boolean value Yes/No No
-[no]accid boolean Include to use sequence accession ID as query Boolean value Yes/No Yes

Input file format

The database definitions for following commands are available at
http://soap.g-language.org/kbws/embossrc

ggcsi reads one or more nucleotide sequences.

Output file format

The output from ggcsi is to a plain text file.

File: nc_000913.ggcsi

Sequence: NC_000913 GCSI: 0.0966615833014818 SA: 487.218569030757 DIST: 69.037726

Data files

None.

Notes

None.

References

   Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
      Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
      for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.

   Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
      large-scale analysis of high-throughput omics data, J. Pest Sci.,
      31, 7.

   Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
      Analysis Environment with REST and SOAP Web Service Interfaces,
      Nucleic Acids Res., 38, W700-W705.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None.

See also

Program name Description
gb1 Calculate strand bias of bacterial genome using B1 index
gb2 Calculate strand bias of bacterial genome using B2 index
gdeltagcskew Calculate strand bias of bacterial genome using delta GC skew
gldabias Calculate strand bias of bacterial genome using linear discriminant analysis (LDA)

Author(s)

Hidetoshi Itaya (celery@g-language.org)
  Institute for Advanced Biosciences, Keio University
  252-0882 Japan

Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
  Institute for Advanced Biosciences, Keio University
  252-0882 Japan

History

2012 - Written by Hidetoshi Itaya

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scrips.

Comments

None.