|
3
|
1 rename_fasta_id
|
|
|
2 ===============
|
|
|
3
|
|
|
4 `rename_fasta_id.pl` is a script to rename fasta IDs according to regular expressions.
|
|
|
5
|
|
|
6 * [Synopsis](#synopsis)
|
|
|
7 * [Description](#description)
|
|
|
8 * [Usage](#usage)
|
|
|
9 * [Options](#options)
|
|
|
10 * [Mandatory options](#mandatory-options)
|
|
|
11 * [Optional options](#optional-options)
|
|
|
12 * [Output](#output)
|
|
|
13 * [Run environment](#run-environment)
|
|
|
14 * [Author - contact](#author---contact)
|
|
|
15 * [Citation, installation, and license](#citation-installation-and-license)
|
|
|
16 * [Changelog](#changelog)
|
|
|
17
|
|
|
18 ## Synopsis
|
|
|
19
|
|
|
20 perl rename_fasta_id.pl -i file.fasta -p "NODE_.+$" -r "K-12_" -n -a c > out.fasta
|
|
|
21
|
|
|
22 **or**
|
|
|
23
|
|
|
24 zcat file.fasta.gz | perl rename_fasta_id.pl -i - -p "coli" -r "" -o > out.fasta
|
|
|
25
|
|
|
26 ## Description
|
|
|
27
|
|
|
28 This script uses the built-in Perl substitution operator `s///` to
|
|
|
29 replace strings in FASTA IDs. To do this, a **pattern** and a
|
|
|
30 **replacement** have to be provided (Perl regular expression syntax
|
|
|
31 can be used). The leading '>' character for the FASTA ID will be
|
|
|
32 removed before the substitution and added again afterwards. FASTA
|
|
|
33 IDs will be searched for matches with the **pattern**, and if found
|
|
|
34 the **pattern** will be replaced by the **replacement**.
|
|
|
35
|
|
|
36 **IMPORTANT**: Enclose the **pattern** and the **replacement** in
|
|
|
37 quotation marks (' or ") if they contain characters that would be
|
|
|
38 interpreted by the shell (e.g. pipes '|', brackets etc.).
|
|
|
39
|
|
|
40 For substitutions without any appendices in a UNIX OS you can of
|
|
|
41 course just use the great
|
|
|
42 [`sed`](https://www.gnu.org/software/sed/manual/sed.html) (see
|
|
|
43 `man sed`), e.g.:
|
|
|
44
|
|
|
45 sed 's/^>pattern/>replacement/' file.fasta
|
|
|
46
|
|
|
47 ## Usage
|
|
|
48
|
|
|
49 perl rename_fasta_id.pl -i file.fasta -p "T" -r "a" -c -g -o
|
|
|
50
|
|
|
51 ## Options
|
|
|
52
|
|
|
53 ### Mandatory options
|
|
|
54
|
|
|
55 - -i, -input
|
|
|
56
|
|
|
57 Input FASTA file or piped STDIN (-) from a gzipped file
|
|
|
58
|
|
|
59 - -p, -pattern
|
|
|
60
|
|
|
61 Pattern to be replaced in FASTA ID
|
|
|
62
|
|
|
63 - -r, -replacement
|
|
|
64
|
|
|
65 Replacement to replace the pattern with. To entirely remove the pattern use '' or "" as input for **-r**.
|
|
|
66
|
|
|
67 ### Optional options
|
|
|
68
|
|
|
69 - -h, -help
|
|
|
70
|
|
|
71 Help (perldoc POD)
|
|
|
72
|
|
|
73 - -c, -case-insensitive
|
|
|
74
|
|
|
75 Match pattern case-insensitive
|
|
|
76
|
|
|
77 - -g, -global
|
|
|
78
|
|
|
79 Replace pattern globally in the string
|
|
|
80
|
|
|
81 - -n, -numerate
|
|
|
82
|
|
|
83 Append a numeration/the count of the pattern hits to the replacement. This is e.g. useful to number contigs consecutively in a draft genome.
|
|
|
84
|
|
|
85 - -a, -append
|
|
|
86
|
|
|
87 Append a string after the numeration, e.g. 'c' for chromosome
|
|
|
88
|
|
|
89 - -o, -output
|
|
|
90
|
|
|
91 Verbose output of the substitutions that were carried out, printed to *STDERR*
|
|
|
92
|
|
|
93 - -v, -version
|
|
|
94
|
|
|
95 Print version number to *STDERR*
|
|
|
96
|
|
|
97 ## Output
|
|
|
98
|
|
|
99 - *STDOUT*
|
|
|
100
|
|
|
101 The FASTA file with substituted ID lines is printed to *STDOUT*. Redirect or pipe into another tool as needed.
|
|
|
102
|
|
|
103 ## Run environment
|
|
|
104
|
|
|
105 The Perl script runs under Windows and UNIX flavors.
|
|
|
106
|
|
|
107 ## Author - contact
|
|
|
108
|
|
|
109 Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
|
|
|
110
|
|
|
111 ## Citation, installation, and license
|
|
|
112
|
|
|
113 For [citation](https://github.com/aleimba/bac-genomics-scripts#citation), [installation](https://github.com/aleimba/bac-genomics-scripts#installation-recommendations), and [license](https://github.com/aleimba/bac-genomics-scripts#license) information please see the repository main [*README.md*](https://github.com/aleimba/bac-genomics-scripts/blob/master/README.md).
|
|
|
114
|
|
|
115 ## Changelog
|
|
|
116
|
|
|
117 - v0.1 (09.11.2014)
|