gldabias

 

Function

Calculate strand bias of bacterial genome using linear discriminant

Description

gldabias calculates strand bias of bacterial genome using linear
discriminant analysis (LDA), as proposed in Reference 1. The basic idea is
to use composition data of genes to train and predict the strand of genes
residing either on the leading or the lagging strand. For computational
efficiency, this method trans and predicts the strands at putative
replication origin as reported by the rep_ori_ter() method. This usually
results in maximum predictability of LDA within bacterial genomes.
Data to use for LDA can be chosen from "base", "codonbase", "codon", and
"amino", with -variable option.

G-language SOAP service is provided by the
Institute for Advanced Biosciences, Keio University.
The original web service is located at the following URL:

http://www.g-language.org/wiki/soap

WSDL(RPC/Encoded) file is located at:

http://soap.g-language.org/g-language.wsdl

Documentation on G-language Genome Analysis Environment methods are
provided at the Document Center

http://ws.g-language.org/gdoc/

Usage

Here is a sample session with gldabias

% gldabias refseqn:NC_000913
Calculate strand bias of bacterial genome using linear discriminant
analysis (LDA)
Program compseq output file [nc_000913.gldabias]: 

Go to the input files for this example
Go to the output files for this example

Command line arguments

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-sequence]
(Parameter 1)
seqall Nucleotide sequence(s) filename and optional format, or reference (input USA) Readable sequence(s) Required
[-outfile]
(Parameter 2)
outfile Program compseq output file Output file <*>.gldabias
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
-coefficients integer Show LDA coefficients Any integer value 0
-variable selection Data to use for LDA. Either 'base', 'codonbase', 'codon', or 'amino' Choose from selection list of values codon
-[no]accid boolean Include to use sequence accession ID as query Boolean value Yes/No Yes

Input file format

The database definitions for following commands are available at
http://soap.g-language.org/kbws/embossrc

gldabias reads one or more nucleotide sequences.

Output file format

The output from gldabias is to a plain text file.

File: nc_000913.gldabias

Sequence: NC_000913 LDA-BIAS: 0.742533

Data files

None.

Notes

None.

References

   Rocha EPC et al. (1999) "Universal replication biases in bacteria",
      Molecular Microbiology, 32(1):11-16

   Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
      Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
      for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.

   Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
      large-scale analysis of high-throughput omics data, J. Pest Sci.,
      31, 7.

   Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
      Analysis Environment with REST and SOAP Web Service Interfaces,
      Nucleic Acids Res., 38, W700-W705.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None.

See also

Program name Description
gb1 Calculate strand bias of bacterial genome using B1 index
gb2 Calculate strand bias of bacterial genome using B2 index
gdeltagcskew Calculate strand bias of bacterial genome using delta GC skew
ggcsi GC Skew Index: an index for strand-specefic mutational bias

Author(s)

Hidetoshi Itaya (celery@g-language.org)
  Institute for Advanced Biosciences, Keio University
  252-0882 Japan

Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
  Institute for Advanced Biosciences, Keio University
  252-0882 Japan

History

2012 - Written by Hidetoshi Itaya

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scrips.

Comments

None.