Mercurial > repos > ktnyt > gembassy
diff GEMBASSY-1.0.3/doc/text/gshuffleseq.txt @ 2:8947fca5f715 draft default tip
Uploaded
author | ktnyt |
---|---|
date | Fri, 26 Jun 2015 05:21:44 -0400 |
parents | 84a17b3fad1f |
children |
line wrap: on
line diff
--- a/GEMBASSY-1.0.3/doc/text/gshuffleseq.txt Fri Jun 26 05:20:29 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,208 +0,0 @@ - gshuffleseq -Function - - Create randomized sequence with conserved k-mer composition - -Description - - gshuffleseq shuffles and randomizes the given sequence, conserving the - nucleotide/peptide k-mer content of the original sequence. - - For k=1, i.e. shuffling sequencing preserving single nucleotide composition, - Fisher-Yates Algorithm is employed. - For k>1, shuffling preserves all k-mers (all k where k=1~k). For example, - k=3 preserves all triplet, doublet, and single nucleotide composition. - Algorithm for k-mer preserved shuffling is non-trivial, which is solved - by graph theoretical approach with Eulerian random walks in the graph of - k-1-mers. See Jiang et al., Kandel et al., and Propp et al., for details - of this algorithm. - - G-language SOAP service is provided by the - Institute for Advanced Biosciences, Keio University. - The original web service is located at the following URL: - - http://www.g-language.org/wiki/soap - - WSDL(RPC/Encoded) file is located at: - - http://soap.g-language.org/g-language.wsdl - - Documentation on G-language Genome Analysis Environment methods are - provided at the Document Center - - http://ws.g-language.org/gdoc/ - -Usage - -Here is a sample session with gshuffleseq - -% gshuffleseq tsw:hbb_human -Create randomized sequence with conserved k-mer composition -output sequence [hbb_human.fasta]: - - Go to the input files for this example - Go to the output files for this example - -Command line arguments - - Standard (Mandatory) qualifiers: - [-sequence] seqall Sequence(s) filename and optional format, or - reference (input USA) - [-outseq] seqout [<sequence>.<format>] Sequence filename and - optional format (output USA) - - Additional (Optional) qualifiers: (none) - Advanced (Unprompted) qualifiers: - -k integer [1] Sequence k-mer to preserve composition - (Any integer value) - - Associated qualifiers: - - "-sequence" associated qualifiers - -sbegin1 integer Start of each sequence to be used - -send1 integer End of each sequence to be used - -sreverse1 boolean Reverse (if DNA) - -sask1 boolean Ask for begin/end/reverse - -snucleotide1 boolean Sequence is nucleotide - -sprotein1 boolean Sequence is protein - -slower1 boolean Make lower case - -supper1 boolean Make upper case - -scircular1 boolean Sequence is circular - -sformat1 string Input sequence format - -iquery1 string Input query fields or ID list - -ioffset1 integer Input start position offset - -sdbname1 string Database name - -sid1 string Entryname - -ufo1 string UFO features - -fformat1 string Features format - -fopenfile1 string Features file name - - "-outseq" associated qualifiers - -osformat2 string Output seq format - -osextension2 string File name extension - -osname2 string Base file name - -osdirectory2 string Output directory - -osdbname2 string Database name to add - -ossingle2 boolean Separate file for each entry - -oufo2 string UFO features - -offormat2 string Features format - -ofname2 string Features file name - -ofdirectory2 string Output directory - - General qualifiers: - -auto boolean Turn off prompts - -stdout boolean Write first file to standard output - -filter boolean Read first file from standard input, write - first file to standard output - -options boolean Prompt for standard and additional values - -debug boolean Write debug output to program.dbg - -verbose boolean Report some/full command line options - -help boolean Report command line options and exit. More - information on associated and general - qualifiers can be found with -help -verbose - -warning boolean Report warnings - -error boolean Report errors - -fatal boolean Report fatal errors - -die boolean Report dying program messages - -version boolean Report version number and exit - -Input file format - - The database definitions for following commands are available at - http://soap.g-language.org/kbws/embossrc - - gshuffleseq reads one or more nucleotide or protein sequences. - -Output file format - - The output from gshuffleseq is to . - - File: hbb_human.fasta - ->HBB_HUMAN P68871 Hemoglobin subunit beta (Beta-globin) (Hemoglobin beta chain) (LVV-hemorphin-7) -KGWLDLVAGAAHFVRRLKMLLEVDWAAHEERVGTSNPNNALKNEAADVEVHSPTHVNPTQ -LVLVQVGFGTLHLQGVECPKPKPGGVALKPVAHLLAMKECTLVALGSDFYVDHGSDGEDK -GFKAYVLATSFFAYTNFLHGKVKHVLF - - -Data files - - None. - -Notes - - None. - -References - - Fisher R.A. and Yates F. (1938) "Example 12", Statistical Tables, London - - Durstenfeld R. (1964) "Algorithm 235: Random permutation", CACM 7(7):420 - - Jiang M., Anderson J., Gillespie J., and Mayne M. (2008) "uShuffle: - a useful tool for shuffling biological sequences while preserving the - k-let counts", BMC Bioinformatics 9:192 - - Kandel D., Matias Y., Unver R., and Winker P. (1996) "Shuffling biological - sequences", Discrete Applied Mathematics 71(1-3):171-185 - - Propp J.G. and Wilson D.B. (1998) "How to get a perfectly random sample - from a generic Markov chain and generate a random spanning tree of a - directed graph", Journal of Algorithms 27(2):170-217 - - Arakawa, K., Mori, K., Ikeda, K., Matsuzaki, T., Konayashi, Y., and - Tomita, M. (2003) G-language Genome Analysis Environment: A Workbench - for Nucleotide Sequence Data Mining, Bioinformatics, 19, 305-306. - - Arakawa, K. and Tomita, M. (2006) G-language System as a Platform for - large-scale analysis of high-throughput omics data, J. Pest Sci., - 31, 7. - - Arakawa, K., Kido, N., Oshita, K., Tomita, M. (2010) G-language Genome - Analysis Environment with REST and SOAP Web Service Interfaces, - Nucleic Acids Res., 38, W700-W705. - -Warnings - - None. - -Diagnostic Error Messages - - None. - -Exit status - - It always exits with a status of 0. - -Known bugs - - None. - -See also - - shuffleseq Shuffles a set of sequences maintaining composition - -Author(s) - - Hidetoshi Itaya (celery@g-language.org) - Institute for Advanced Biosciences, Keio University - 252-0882 Japan - - Kazuharu Arakawa (gaou@sfc.keio.ac.jp) - Institute for Advanced Biosciences, Keio University - 252-0882 Japan - -History - - 2012 - Written by Hidetoshi Itaya - 2013 - Fixed by Hidetoshi Itaya - -Target users - - This program is intended to be used by everyone and everything, from - naive users to embedded scripts. - -Comments - - None. -