Mercurial > repos > dereeper > pangenome_explorer
comparison COG/bac-genomics-scripts/cat_seq/README.md @ 3:e42d30da7a74 draft
Uploaded
author | dereeper |
---|---|
date | Thu, 30 May 2024 11:52:25 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
2:97e4e3e818b6 | 3:e42d30da7a74 |
---|---|
1 cat_seq | |
2 ======= | |
3 | |
4 A script to merge multi-sequence RichSeq files into one single-entry 'artificial' sequence file. | |
5 | |
6 * [Synopsis](#synopsis) | |
7 * [Description](#description) | |
8 * [Usage](#usage) | |
9 * [Merge multi-sequence file](#merge-multi-sequence-file) | |
10 * [Merge multi-sequence file and specify different output format](#merge-multi-sequence-file-and-specify-different-output-format) | |
11 * [UNIX loop to concatenate each multi-sequence file in the current working directory](#unix-loop-to-concatenate-each-multi-sequence-file-in-the-current-working-directory) | |
12 * [Concatenate multi-sequence fasta files faster with UNIX's `grep`](#concatenate-multi-sequence-fasta-files-faster-with-unixs-grep) | |
13 * [Output](#output) | |
14 * [Dependencies](#dependencies) | |
15 * [Run environment](#run-environment) | |
16 * [Alternative software](#alternative-software) | |
17 * [Author - contact](#author---contact) | |
18 * [Citation, installation, and license](#citation-installation-and-license) | |
19 * [Changelog](#changelog) | |
20 | |
21 ## Synopsis | |
22 | |
23 perl cat_seq.pl multi-seq_file.embl | |
24 | |
25 ## Description | |
26 | |
27 This script concatenates multiple sequences in a RichSeq file (embl or genbank, but also fasta) to a single artificial sequence. The first sequence in the file is used as a foundation to add the subsequent sequences, along with all features and annotations. | |
28 | |
29 Optionally, a different output file format can be specified (fasta/embl/genbank). | |
30 | |
31 ## Usage | |
32 | |
33 ### Merge multi-sequence file | |
34 | |
35 perl cat_seq.pl multi-seq_file.gbk | |
36 | |
37 ### Merge multi-sequence file and specify different output format | |
38 | |
39 perl cat_seq.pl multi-seq_file.embl [fasta|genbank] | |
40 | |
41 ### UNIX loop to concatenate each multi-sequence file in the current working directory | |
42 | |
43 for i in *.[embl|fasta|gbk]; do perl cat_seq.pl $i [embl|fasta|genbank]; done | |
44 | |
45 ### Concatenate multi-sequence fasta files faster with UNIXs *grep* | |
46 If you're working only with fasta files UNIX's `grep` is a faster choice to concatenate sequences. | |
47 | |
48 grep -v ">" seq.fasta > seq_artificial.fasta | |
49 | |
50 Subsequently add as a first line a fasta ID (starting with '>') with an editor. | |
51 | |
52 ## Output | |
53 | |
54 * *\_artificial.[embl|fasta|genbank] | |
55 | |
56 Concatenated artificial sequence in the input format, or optionally the specified output sequence format. | |
57 | |
58 ## Dependencies | |
59 | |
60 * BioPerl (tested with version 1.006901) | |
61 | |
62 ## Run environment | |
63 | |
64 The Perl script runs under Windows and UNIX flavors. | |
65 | |
66 ## Alternative software | |
67 | |
68 The EMBOSS (The European Molecular Biology Open Software Suite) application ***union*** can also be used for this task (http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/union.html). | |
69 | |
70 ## Author - contact | |
71 | |
72 Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster) | |
73 | |
74 ## Citation, installation, and license | |
75 | |
76 For [citation](https://github.com/aleimba/bac-genomics-scripts#citation), [installation](https://github.com/aleimba/bac-genomics-scripts#installation-recommendations), and [license](https://github.com/aleimba/bac-genomics-scripts#license) information please see the repository main [*README.md*](https://github.com/aleimba/bac-genomics-scripts/blob/master/README.md). | |
77 | |
78 ## Changelog | |
79 | |
80 * v0.1 (08.02.2013) |