gcodoncompiler

 

Function

Calculate various kinds of amino acid and codon usage data

Description

gcodoncompiler calculates various kinds of amino acid and codon usage data.
The following values are calculable:
A0: Absolute amino acid frequency
A1: Relative amino acid frequency
C0: Absolute codon frequency
C1: Relative codon frequency in a complete sequence
C2: Relative codon frequency in each amino acid
C3: Relative synonymous codon usage
C4: Relative adaptiveness
C5: Maximum or minor codon
For amino acids unpresent in a gene, C2-C3 does not calculate the values.
By using R* in place, such values are hypothesized that alternative
synonymous codons are used with equal frequency.

G-language SOAP service is provided by the
Institute for Advanced Biosciences, Keio University.
The original web service is located at the following URL:

http://www.g-language.org/wiki/soap

WSDL(RPC/Encoded) file is located at:

http://soap.g-language.org/g-language.wsdl

Documentation on G-language Genome Analysis Environment methods are
provided at the Document Center

http://ws.g-language.org/gdoc/

Usage

Here is a sample session with gcodoncompiler

% gcodoncompiler refseqn:NC_000913
Calculate various kinds of amino acid and codon usage data
Codon usage output file [nc_000913.gcodoncompiler]: 

Go to the input files for this example
Go to the output files for this example

Command line arguments

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-sequence]
(Parameter 1)
seqall Nucleotide sequence(s) filename and optional format, or reference (input USA) Readable sequence(s) Required
[-outfile]
(Parameter 2)
outfile Codon usage output file Output file <*>.gcodoncompiler
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
-translate boolean Include to translate using standard codon table Boolean value Yes/No No
-startcodon boolean Include to include start codon Boolean value Yes/No No
-stopcodon boolean Include to include stop codon Boolean value Yes/No No
-delkey string Regular expression to delete key (i.e. amino acids and nucleotides) Any string [^ACDEFGHIKLMNPQRSTVWYacgtU]
-data list Kinds of codon usage data. R* hypothesizes amino acids which are not present in the gene
A0 (Absolute amino acid frequency ('AA'))
A1 (Relative amino acid frequency ('RAAU'))
C0 (Absolute codon frequency ('AF'))
C1 (Relative codon frequency in a complete sequence)
C2 (Relative codon frequency in each amino acid ('RF'))
C3 (Relative synonymous codon usage ('RSCU'))
C4 (Relative adaptiveness)
i.e., ratio to maximum of minor codon ('W') C5 (Maximum (1) or minor (0) codon)
R0 (Absolute codon frequency ('AF'))
R1 (Relative codon frequency in a complete sequence)
R2 (Relative codon frequency in each amino acid ('RF'))
R3 (Relative synonymous codon usage ('RSCU'))
R4 (Relative adaptiveness)
i.e., ratio to maximum of minor codon ('W') R5 (Maximum (1) or minor (0) codon)
R0
-[no]accid boolean Include to use sequence accession ID as query Boolean value Yes/No Yes

Input file format

The database definitions for following commands are available at
http://soap.g-language.org/kbws/embossrc

gcodoncompiler reads one or more nucleotide sequences.

Output file format

The output from gcodoncompiler is to a plain text file.

File: nc_000913.gcodoncompiler

Sequence: NC_000913
Agca,Agcc,Agcg,Agct,Ctgc,Ctgt,Dgac,Dgat,Egaa,Egag,Fttc,Fttt,Ggga,Gggc,Gggg,Gggt,Hcac,Hcat,Iata,Iatc,Iatt,Kaaa,Kaag,Lcta,Lctc,Lctg,Lctt,Ltta,Lttg,Matg,Naac,Naat,Pcca,Pccc,Pccg,Pcct,Qcaa,Qcag,Raga,Ragg,Rcga,Rcgc,Rcgg,Rcgt,Sagc,Sagt,Stca,Stcc,Stcg,Stct,Taca,Tacc,Tacg,Tact,Utga,Vgta,Vgtc,Vgtg,Vgtt,Wtgg,Ytac,Ytat,locus_tag
26551,33911,44924,20010,8486,6707,25234,42161,52362,23474,21841,29334,10226,39395,14472,32678,12830,16952,5356,33359,40221,44272,13398,5079,14709,70441,14410,18097,17936,32971,28329,22786,11063,7142,30994,9130,20216,38169,2495,1366,4529,29308,6991,27864,21132,11323,9159,11332,11759,10992,8979,31001,18989,11581,3,14337,20240,34499,24056,20071,16088,21069,

Data files

None.

Notes

None.

References

   Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and
      Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench
      for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306.

   Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for
      large-scale analysis of high-throughput omics data, J. Pest Sci.,
      31, 7.

   Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome
      Analysis Environment with REST and SOAP Web Service Interfaces,
      Nucleic Acids Res., 38, W700-W705.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None.

See also

Program name Description
gaminoinfo Prints out basic amino acid sequence statistics
gaaui Calculates various indece of amino acid usage

Author(s)

   Hidetoshi Itaya (celery@g-language.org)
   Institute for Advanced Biosciences, Keio University
   252-0882 Japan

  Kazuharu Arakawa (gaou@sfc.keio.ac.jp)
  Institute for Advanced Biosciences, Keio University
  252-0882 Japan

History

2012 - Written by Hidetoshi Itaya

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scrips.

Comments

None.