3
|
1 order_fastx
|
|
2 ===========
|
|
3
|
|
4 `order_fastx.pl` is a script to order sequences in FASTA or FASTQ files.
|
|
5
|
|
6 * [Synopsis](#synopsis)
|
|
7 * [Description](#description)
|
|
8 * [Usage](#usage)
|
|
9 * [Options](#options)
|
|
10 * [Mandatory options](#mandatory-options)
|
|
11 * [Optional options](#optional-options)
|
|
12 * [Output](#output)
|
|
13 * [Run environment](#run-environment)
|
|
14 * [Author - contact](#author---contact)
|
|
15 * [Citation, installation, and license](#citation-installation-and-license)
|
|
16 * [Changelog](#changelog)
|
|
17
|
|
18
|
|
19 ## Synopsis
|
|
20
|
|
21 perl order_fastx.pl -i infile.fasta -l order_id_list.txt > ordered.fasta
|
|
22
|
|
23 ## Description
|
|
24
|
|
25 Order sequence entries in FASTA or FASTQ sequence files according to
|
|
26 an ID list with a given order. Beware, the IDs in the order list
|
|
27 have to be **identical** to the entire IDs in the sequence file.
|
|
28
|
|
29 However, the ">" or "@" ID identifiers of FASTA or FASTQ files,
|
|
30 respectively, can be omitted in the ID list.
|
|
31
|
|
32 The file type is detected automatically. But, you can set the file
|
|
33 type manually with option **-f**. FASTQ format assumes **four** lines
|
|
34 per read, if this is not the case run the FASTQ file through
|
|
35 [`fastx_fix.pl`](/fastx_fix) or use Heng Li's [`seqtk
|
|
36 seq`](https://github.com/lh3/seqtk):
|
|
37
|
|
38 seqtk seq -l 0 infile.fq > outfile.fq
|
|
39
|
|
40 The script can also be used to pull a subset of sequences in the ID
|
|
41 list from the sequence file. Probably best to set option flag **-s**
|
|
42 in this case, see [Optional options](#optional-options) below. But, rather use
|
|
43 [`filter_fastx.pl`](/filter_fastx).
|
|
44
|
|
45 ## Usage
|
|
46
|
|
47 perl order_fastx.pl -i infile.fq -l order_id_list.txt -s -f fastq > ordered.fq
|
|
48
|
|
49 perl order_fastx.pl -i infile.fasta -l order_id_list.txt -e > ordered.fasta
|
|
50
|
|
51 ## Options
|
|
52
|
|
53 ### Mandatory options
|
|
54
|
|
55 - -i, -input
|
|
56
|
|
57 Input FASTA or FASTQ file
|
|
58
|
|
59 - -l, -list
|
|
60
|
|
61 List with sequence IDs in specified order
|
|
62
|
|
63 ### Optional options
|
|
64
|
|
65 - -h, -help
|
|
66
|
|
67 Help (perldoc POD)
|
|
68
|
|
69 - -f, -file_type
|
|
70
|
|
71 Set the file type manually [fasta|fastq]
|
|
72
|
|
73 - -e, -error_files
|
|
74
|
|
75 Write missing IDs in the seq file or the order ID list without an equivalent in the other to error files instead of *STDERR* (see [Output](#output) below)
|
|
76
|
|
77 - -s, -skip_errors
|
|
78
|
|
79 Skip missing ID error statements, excludes option **-e**
|
|
80
|
|
81 - -v, -version
|
|
82
|
|
83 Print version number to *STDERR*
|
|
84
|
|
85 ## Output
|
|
86
|
|
87 - *STDOUT*
|
|
88
|
|
89 The newly ordered sequences are printed to *STDOUT*. Redirect or pipe into another tool as needed.
|
|
90
|
|
91 - (order_ids_missing.txt)
|
|
92
|
|
93 If IDs in the order list are missing in the sequence file with option **-e**
|
|
94
|
|
95 - (seq_ids_missing.txt)
|
|
96
|
|
97 If IDs in the sequence file are missing in the order ID list with option **-e**
|
|
98
|
|
99 ## Run environment
|
|
100
|
|
101 The Perl script runs under Windows and UNIX flavors.
|
|
102
|
|
103 ## Author - contact
|
|
104
|
|
105 Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
|
|
106
|
|
107 ## Citation, installation, and license
|
|
108
|
|
109 For [citation](https://github.com/aleimba/bac-genomics-scripts#citation), [installation](https://github.com/aleimba/bac-genomics-scripts#installation-recommendations), and [license](https://github.com/aleimba/bac-genomics-scripts#license) information please see the repository main [*README.md*](https://github.com/aleimba/bac-genomics-scripts/blob/master/README.md).
|
|
110
|
|
111 ## Changelog
|
|
112
|
|
113 - v0.1 (20.11.2014)
|