3
|
1 cat_seq
|
|
2 =======
|
|
3
|
|
4 A script to merge multi-sequence RichSeq files into one single-entry 'artificial' sequence file.
|
|
5
|
|
6 * [Synopsis](#synopsis)
|
|
7 * [Description](#description)
|
|
8 * [Usage](#usage)
|
|
9 * [Merge multi-sequence file](#merge-multi-sequence-file)
|
|
10 * [Merge multi-sequence file and specify different output format](#merge-multi-sequence-file-and-specify-different-output-format)
|
|
11 * [UNIX loop to concatenate each multi-sequence file in the current working directory](#unix-loop-to-concatenate-each-multi-sequence-file-in-the-current-working-directory)
|
|
12 * [Concatenate multi-sequence fasta files faster with UNIX's `grep`](#concatenate-multi-sequence-fasta-files-faster-with-unixs-grep)
|
|
13 * [Output](#output)
|
|
14 * [Dependencies](#dependencies)
|
|
15 * [Run environment](#run-environment)
|
|
16 * [Alternative software](#alternative-software)
|
|
17 * [Author - contact](#author---contact)
|
|
18 * [Citation, installation, and license](#citation-installation-and-license)
|
|
19 * [Changelog](#changelog)
|
|
20
|
|
21 ## Synopsis
|
|
22
|
|
23 perl cat_seq.pl multi-seq_file.embl
|
|
24
|
|
25 ## Description
|
|
26
|
|
27 This script concatenates multiple sequences in a RichSeq file (embl or genbank, but also fasta) to a single artificial sequence. The first sequence in the file is used as a foundation to add the subsequent sequences, along with all features and annotations.
|
|
28
|
|
29 Optionally, a different output file format can be specified (fasta/embl/genbank).
|
|
30
|
|
31 ## Usage
|
|
32
|
|
33 ### Merge multi-sequence file
|
|
34
|
|
35 perl cat_seq.pl multi-seq_file.gbk
|
|
36
|
|
37 ### Merge multi-sequence file and specify different output format
|
|
38
|
|
39 perl cat_seq.pl multi-seq_file.embl [fasta|genbank]
|
|
40
|
|
41 ### UNIX loop to concatenate each multi-sequence file in the current working directory
|
|
42
|
|
43 for i in *.[embl|fasta|gbk]; do perl cat_seq.pl $i [embl|fasta|genbank]; done
|
|
44
|
|
45 ### Concatenate multi-sequence fasta files faster with UNIXs *grep*
|
|
46 If you're working only with fasta files UNIX's `grep` is a faster choice to concatenate sequences.
|
|
47
|
|
48 grep -v ">" seq.fasta > seq_artificial.fasta
|
|
49
|
|
50 Subsequently add as a first line a fasta ID (starting with '>') with an editor.
|
|
51
|
|
52 ## Output
|
|
53
|
|
54 * *\_artificial.[embl|fasta|genbank]
|
|
55
|
|
56 Concatenated artificial sequence in the input format, or optionally the specified output sequence format.
|
|
57
|
|
58 ## Dependencies
|
|
59
|
|
60 * BioPerl (tested with version 1.006901)
|
|
61
|
|
62 ## Run environment
|
|
63
|
|
64 The Perl script runs under Windows and UNIX flavors.
|
|
65
|
|
66 ## Alternative software
|
|
67
|
|
68 The EMBOSS (The European Molecular Biology Open Software Suite) application ***union*** can also be used for this task (http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/union.html).
|
|
69
|
|
70 ## Author - contact
|
|
71
|
|
72 Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
|
|
73
|
|
74 ## Citation, installation, and license
|
|
75
|
|
76 For [citation](https://github.com/aleimba/bac-genomics-scripts#citation), [installation](https://github.com/aleimba/bac-genomics-scripts#installation-recommendations), and [license](https://github.com/aleimba/bac-genomics-scripts#license) information please see the repository main [*README.md*](https://github.com/aleimba/bac-genomics-scripts/blob/master/README.md).
|
|
77
|
|
78 ## Changelog
|
|
79
|
|
80 * v0.1 (08.02.2013)
|