gbaseinformationcontent

 

Function

Calculates and graphs the sequence conservation using information content

Description

This function calculates and graphs the sequence conservation in regions
around the start/stop codons using information content. Values are obtained
by subtracting the entropy for each positfion from the maximum possible value
(which will be 2 in the case of nucleotide sequences). Information content
will show the highest value when the frequency is most biased to a single
alphabet.

G-language SOAP service is provided by the
Institute for Advanced Biosciences, Keio University.
The original web service is located at the following URL:

http://www.g-language.org/wiki/soap

WSDL(RPC/Encoded) file is located at:

http://soap.g-language.org/g-language.wsdl

Documentation on G-language Genome Analysis Environment methods are
provided at the Document Center

http://ws.g-language.org/gdoc/

Usage

Here is a sample session with gbaseinformationcontent

% gbaseinformationcontent refseqn:NC_000913
Calculates and graphs the sequence conservation using information content
Program compseq output file (optional) [nc_000913.gbaseinformationcontent]: 

Go to the input files for this example
Go to the output files for this example

Example 2

% gbaseinformationcontent refseqn:NC_000913 -plot -graph png
Calculates and graphs the sequence conservation using information content
Created gbaseinformationcontent.1.png

Go to the input files for this example
Go to the output files for this example

Command line arguments

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-sequence]
(Parameter 1)
seqall Nucleotide sequence(s) filename and optional format, or reference (input USA) Readable sequence(s) Required
-graph xygraph Graph type EMBOSS has a list of known devices, including ps, hpgl, hp7470, hp7580, meta, cps, x11, tek, tekt, none, data, xterm, png, gif, svg EMBOSS_GRAPHICS value, or x11
-outfile outfile Program compseq output file (optional) Output file <*>.gbaseinformationcontent
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
-position selection Either 'start' (around start codon) or 'end' (around stop codon) to create the PWM Choose from selection list of values start
-upstream integer Length upstream of specified position to create PWM Any integer value 30
-downstream integer Length downstream of specified position to create PWM Any integer value 30
-patlen integer Length of oligomer to count Any integer value 3
-[no]accid boolean Include to use sequence accession ID as query Boolean value Yes/No Yes
-plot toggle Include to plot result Toggle value Yes/No No

Input file format

The database definitions for following commands are available at
http://soap.g-language.org/kbws/embossrc

gbaseinformationcontent reads one or more nucleotide sequences.

Output file format

The output from gbaseinformationcontent is to a plain text file or the EMBOSS graphics device.

File: nc_000913.gbaseinformationcontent

Sequence: NC_000913
-30,2.42457
-29,2.42811
-28,2.43235
-27,2.43116
-26,2.44278
-25,2.44236
-24,2.44502
-23,2.46097
-22,2.46588

[Part of this file has been deleted for brevity]

21,2.27547
22,2.46974
23,2.46342
24,2.32686
25,2.46245
26,2.46061
27,2.27664
28,2.45650
29,2.48206
30,2.29140

Data files

None.

Notes

None.

References

   Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
      Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
      for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.

   Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
      large-scale analysis of high-throughput omics data, J. Pest Sci.,
      31, 7.

   Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
      Analysis Environment with REST and SOAP Web Service Interfaces,
      Nucleic Acids Res., 38, W700-W705.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None.

See also

Program name Description
gbaseentropy Calculates and graphs the sequence conservation using Shanon uncertainty (entropy)
gbaserelativeentropy Calculates and graphs the sequence conservation using Kullback-Leibler divergence (relative entropy)

Author(s)

Hidetoshi Itaya (celery@g-language.org)
  Institute for Advanced Biosciences, Keio University
  252-0882 Japan

Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
  Institute for Advanced Biosciences, Keio University
  252-0882 Japan

History

2012 - Written by Hidetoshi Itaya

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scrips.

Comments

None.