![]() |
genret |
genret reads in one or more genome flatfiles and retrieves various data from
the input file. It is a wrapper program to the G-language REST service,
where a method is specified by giving a string to the "method" qualifier. By
default, genret will parse the input file to retrieve the accession ID
(or name) of the genome to query G-language REST service. By setting the
"accid" qualifier to false (or 0), genret will instead parse the sequence
and features of the genome to create a GenBank formatted flatfile and upload
the file to the G-language web server. Using the file uploaded, genret will
execute the method provided.
genret is able to perform a variety of tasks, incluing the retrieval of
sequence upstream, downstream, or around the start or stop codon,
translated gene sequences search of gene data by keyword.
Details on G-language REST service is available from the wiki page
http://www.g-language.org/wiki/rest
Documentation on G-language Genome Analysis Environment methods are
provided at the Document Center
http://ws.g-language.org/gdoc/
% genret Retrieves various gene related information from genome flatfile Input nucleotide sequence(s): refseqn:NC_000913 Gene name(s) to lookup [*]: Feature to access: around_startcodon Full text output file [nc_000913.around_startcodon]: |
% genret Retrieves various gene related information from genome flatfile Input nucleotide sequence(s): refseqn:NC_000913 List of gene name(s) to report [*]: recA,recB Name of gene feature to access: translation Sequence output file [nc_000913.translation.genret]: stdout >recA MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF >recB MSDVAETLDPLRLPLQGERLIEASAGTGKTFTIAALYLRLLLGLGGSAAFPRPLTVEELLV VTFTEAATAELRGRIRSNIHELRIACLRETTDNPLYERLLEEIDDKAQAAQWLLLAERQMD EAAVFTIHGFCQRMLNLNAFESGMLFEQQLIEDESLLRYQACADFWRRHCYPLPREIAQVV FETWKGPQALLRDINRYLQGEAPVIKAPPPDDETLASRHAQIVARIDTVKQQWRDAVGELD ALIESSGIDRRKFNRSNQAKWIDKISAWAEEETNSYQLPESLEKFSQRFLEDRTKAGGETP RHPLFEAIDQLLAEPLSIRDLVITRALAEIRETVAREKRRRGELGFDDMLSRLDSALRSES GEVLAAAIRTRFPVAMIDEFQDTDPQQYRIFRRIWHHQPETALLLIGDPKQAIYAFRGADI FTYMKARSEVHAHYTLDTNWRSAPGMVNSVNKLFSQTDDAFMFREIPFIPVKSAGKNQALR FVFKGETQPAMKMWLMEGESCGVGDYQSTMAQVCAAQIRDWLQAGQRGEALLMNGDDARPV RASDISVLVRSRQEAAQVRDALTLLEIPSVYLSNRDSVFETLEAQEMLWLLQAVMTPEREN TLRSALATSMMGLNALDIETLNNDEHAWDVVVEEFDGYRQIWRKRGVMPMLRALMSARNIA ENLLATAGGERRLTDILHISELLQEAGTQLESEHALVRWLSQHILEPDSNASSQQMRLESD KHLVQIVTIHKSKGLEYPLVWLPFITNFRVQEQAFYHDRHSFEAVLDLNAAPESVDLAEAE RLAEDLRLLYVALTRSVWHCSLGVAPLVRRRGDKKGDTDVHQSALGRLLQKGEPQDAAGLR TCIEALCDDDIAWQTAQTGDNQPWQVNDVSTAELNAKTLQRLPGDNWRVTSYSGLQQRGHG IAQDLMPRLDVDAAGVASVVEEPTLTPHQFPRGASPGTFLHSLFEDLDFTQPVDPNWVREK LELGGFESQWEPVLTEWITAVLQAPLNETGVSLSQLSARNKQVEMEFYLPISEPLIASQLD TLIRQFDPLSAGCPPLEFMQVRGMLKGFIDLVFRHEGRYYLLDYKSNWLGEDSSAYTQQAM AAAMQAHRYDLQYQLYTLALHRYLRHRIADYDYEHHFGGVIYLFLRGVDKEHPQQGIYTTR PNAGLIALMDEMFAGMTLEEA |
% genret Retrieves various gene features from genome flatfile Input nucleotide sequence(s): refseqn:NC_000913 List of gene name(s) to report [*]: @gene_list.txt Name of gene feature to access: direction Full text output file [nc_000913.direction]: stdout gene,direction thrA,direct thrB,direct thrC,direct |
% genret Retrieves various gene related information from genome flatfile Input nucleotide sequence(s): refseqn:NC_000913 Gene name(s) to lookup [*]: recA Feature to access: translation Full text output file [nc_000913.translation]: stdout >recA MAIDENKQKALAAALGQIEKQFGKGSIMRLGEDRSMDVETISTGSLSLDIALGAGGLPMGR IVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIYARKLGVDIDNLLCSQPDT GEQALEICDALARSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGNL KQSNTLLIFINQIRMKIGVMFGNPETTTGGNALKFYASVRLDIRRIGAVKEGENVVGSETR VKVVKNKIAAPFKQAEFQILYGEGINFYGELVDLGVKEKLIEKAGAWYSYKGEKIGQGKAN ATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDDSEGVAETNEDF |
% genret Retrieves various gene related information from genome flatfile Input nucleotide sequence(s): refseqn:NC_000913 Gene name(s) to lookup [*]: Feature to access: start Full text output file [nc_000913.start]: |
% genret refseqn:NC_000913 recA around_startcodon -argument 30,30 stdout Retrieves various gene features from genome flatfile >recA ccggtattacccggcatgacaggagtaaaaatggctatcgacgaaaacaaacagaaagcgt tg |
% genret refseqn:NC_000913 '*' annotate nc_000913-annotate.gbk Retrieves various gene features from genome flatfile |
Qualifier | Type | Description | Allowed values | Default |
---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||
[-sequence] (Parameter 1) |
seqall | Nucleotide sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
[-gene] (Parameter 2) |
string | List of gene name(s) to report | Any string | * |
[-access] (Parameter 3) |
string | Name of gene feature to access | Any word | |
[-outfile] (Parameter 4) |
outfile | Sequence output file | Output file | <*>.genret |
Additional (Optional) qualifiers | ||||
(none) | ||||
Advanced (Unprompted) qualifiers | ||||
-argument | string | Extra arguments to pass to method | Any string | |
-[no]accid | boolean | Include to use sequence accession ID as query | Boolean value Yes/No | Yes |
Database definitions for the examples are included in the embossrc_template
file of the Keio Bioinformatcs Web Service (KBWS) package.
Input files for usage example 4
File: gene_list.txt
thrA
thrB
thrC
Output files for usage example 1
File: nc_000913.around_startcodon
>thrL cgtgagtaaattaaaattttattgacttaggtcactaaatactttaaccaatataggcata gcgcacagacagataaaaattacagagtacacaacatccatgaaacgcattagcaccacca ttaccaccaccatcaccattaccacaggtaacggtgcgggctgacgcgtacaggaaacaca gaaaaaagcccgcacctgac >thrA aggtaacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgc gggctttttttttcgaccaaaggtaacgaggtaacaaccatgcgagtgttgaagttcggcg gtacatcagtggcaaatgcagaacgttttctgcgtgttgccgatattctggaaagcaatgc caggcaggggcaggtggcca [Part of this file has been deleted for brevity] >yjjY tgcatgtttgctacctaaattgccaactaaatcgaaacaggaagtacaaaagtccctgacc tgcctgatgcatgctgcaaattaacatgatcggcgtaacatgactaaagtacgtaattgcg ttcttgatgcactttccatcaacgtcaacaacatcattagcttggtcgtgggtactttccc tcaggacccgacagtgtcaa >yjtD tttttctgcgacttacgttaagaatttgtaaattcgcaccgcgtaataagttgacagtgat cacccggttcgcggttatttgatcaagaagagtggcaatatgcgtataacgattattctgg tcgcacccgccagagcagaaaatattggggcagcggcgcgggcaatgaaaacgatggggtt tagcgatctgcggattgtcg |
gene,start thrL,190 thrA,337 thrB,2801 thrC,3734 yaaX,5234 yaaA,5683 yaaJ,6529 talB,8238 mog,9306 [Part of this file has been deleted for brevity] yjjX,4631256 ytjC,4631820 rob,4632464 creA,4633544 creB,4634030 creC,4634719 creD,4636201 arcA,4637613 yjjY,4638425 yjtD,4638965 |
LOCUS NC_000913 4639675 bp DNA circular BCT 25-OCT-2010 DEFINITION Escherichia coli str. K-12 substr. MG1655 chromosome, complete genome. ACCESSION NC_000913 VERSION NC_000913.2 GI:49175990 DBLINK Project: 57779 KEYWORDS . SOURCE Escherichia coli str. K-12 substr. MG1655 ORGANISM Escherichia coli str. K-12 substr. MG1655 Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; [Part of this file has been deleted for brevity] CDS 2801..3733 /EC_number="2.7.1.39" /codon_start="1" /db_xref="GI:16127997" /db_xref="ASAP:ABE-0000010" /db_xref="UniProtKB/Swiss-Prot:P00547" /db_xref="ECOCYC:EG10999" /db_xref="EcoGene:EG10999" /db_xref="GeneID:947498" /function="enzyme; Amino acid biosynthesis: Threonine" /function="1.5.1.8 metabolism; building block biosynthesis; amino acids; threonine" /function="7.1 location of gene products; cytoplasm" /gene="thrB" /gene_synonym="ECK0003; JW0002" /locus_tag="b0003" /note="GO_component: GO:0005737 - cytoplasm; GO_process: GO:0009088 - threonine biosynthetic process" /product="homoserine kinase" /protein_id="NP_414544.1" /rs_com="FUNCTION: Catalyzes the ATP-dependent phosphorylation of L- homoserine to L-homoserine phosphate (By similarity)." /rs_com="CATALYTIC ACTIVITY: ATP + L-homoserine = ADP + O-phospho-L- homoserine." /rs_com="PATHWAY: Amino-acid biosynthesis; L-threonine biosynthesis; L- threonine from L-aspartate: step 4/5." /rs_com="SUBCELLULAR LOCATION: Cytoplasm (Potential)." /rs_com="SIMILARITY: Belongs to the GHMP kinase family. Homoserine kinase subfamily." /rs_des="RecName: Full=Homoserine kinase; Short=HK; Short=HSK; EC=2.7.1.39;" /rs_protein="Level 1: similar to KHSE_ECODH 1.7e-180" /rs_xr="EMBL; CP000948; ACB01208.1; -; Genomic_DNA." /rs_xr="RefSeq; YP_001728986.1; -." /rs_xr="ProteinModelPortal; B1XBC8; -." /rs_xr="SMR; B1XBC8; 2-308." /rs_xr="EnsemblBacteria; EBESCT00000012034; EBESCP00000011562; EBESCG00000011096." /rs_xr="GeneID; 6058639; -." /rs_xr="GenomeReviews; CP000948_GR; ECDH10B_0003." /rs_xr="KEGG; ecd:ECDH10B_0003; -." /rs_xr="HOGENOM; HBG646290; -." /rs_xr="OMA; GSAHADN; -." /rs_xr="ProtClustDB; PRK01212; -." /rs_xr="BioCyc; ECOL316385:ECDH10B_0003-MONOMER; -." /rs_xr="GO; GO:0005737; C:cytoplasm; IEA:UniProtKB-SubCell." /rs_xr="GO; GO:0005524; F:ATP binding; IEA:UniProtKB-KW." /rs_xr="GO; GO:0004413; F:homoserine kinase activity; IEA:EC." /rs_xr="GO; GO:0009088; P:threonine biosynthetic process; IEA:UniProtKB-KW." /rs_xr="HAMAP; MF_00384; Homoser_kinase; 1; -." /rs_xr="InterPro; IPR006204; GHMP_kinase." /rs_xr="InterPro; IPR013750; GHMP_kinase_C." /rs_xr="InterPro; IPR006203; GHMP_knse_ATP-bd_CS." /rs_xr="InterPro; IPR000870; Homoserine_kin." /rs_xr="InterPro; IPR020568; Ribosomal_S5_D2-typ_fold." /rs_xr="InterPro; IPR014721; Ribosomal_S5_D2-typ_fold_subgr." /rs_xr="Gene3D; G3DSA:3.30.230.10; Ribosomal_S5_D2-type_fold; 1." /rs_xr="Pfam; PF08544; GHMP_kinases_C; 1." /rs_xr="Pfam; PF00288; GHMP_kinases_N; 1." /rs_xr="PIRSF; PIRSF000676; Homoser_kin; 1." /rs_xr="PRINTS; PR00958; HOMSERKINASE." /rs_xr="SUPFAM; SSF54211; Ribosomal_S5_D2-typ_fold; 1." /rs_xr="TIGRFAMs; TIGR00191; thrB; 1." /rs_xr="PROSITE; PS00627; GHMP_KINASES_ATP; 1." /transl_table="11" /translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETF SLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACS VVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDI ISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQ PELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETA QRVADWLGKNYLQNQEGFVHICRLDTAGARVLEN" [Part of this file has been deleted for brevity] 4639201 gcgcagtcgg gcgaaatatc attactacgc cacgccagtt gaactggtgc cgctgttaga 4639261 ggaaaaatct tcatggatga gccatgccgc gctggtgttt ggtcgcgaag attccgggtt 4639321 gactaacgaa gagttagcgt tggctgacgt tcttactggt gtgccgatgg tggcggatta 4639381 tccttcgctc aatctggggc aggcggtgat ggtctattgc tatcaattag caacattaat 4639441 acaacaaccg gcgaaaagtg atgcaacggc agaccaacat caactgcaag ctttacgcga 4639501 acgagccatg acattgctga cgactctggc agtggcagat gacataaaac tggtcgactg 4639561 gttacaacaa cgcctggggc ttttagagca acgagacacg gcaatgttgc accgtttgct 4639621 gcatgatatt gaaaaaaata tcaccaaata aaaaacgcct tagtaagtat ttttc // |
None.
None.
Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306. Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for large-scale analysis of high-throughput omics data, J. Pest Sci., 31, 7. Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome Analysis Environment with REST and SOAP Web Service Interfaces, Nucleic Acids Res., 38, W700-W705.
None.
None.
It always exits with a status of 0.
None.
Program name | Description |
---|---|
entret | Retrieve sequence entries from flatfile databases and files |
seqret | Read and write (return) sequences |
Hidetoshi Itaya (celery@g-language.org) Institute for Advanced Biosciences, Keio University 252-0882 Japan Kazuharu Arakawa (gaou@sfc.keio.ac.jp) Institute for Advanced Biosciences, Keio University 252-0882 Japan