Mercurial > repos > dereeper > pangenome_explorer
view COG/bac-genomics-scripts/order_fastx/README.md @ 14:5a5c9a6b047b draft
Uploaded
author | dereeper |
---|---|
date | Tue, 10 Dec 2024 16:20:53 +0000 |
parents | e42d30da7a74 |
children |
line wrap: on
line source
order_fastx =========== `order_fastx.pl` is a script to order sequences in FASTA or FASTQ files. * [Synopsis](#synopsis) * [Description](#description) * [Usage](#usage) * [Options](#options) * [Mandatory options](#mandatory-options) * [Optional options](#optional-options) * [Output](#output) * [Run environment](#run-environment) * [Author - contact](#author---contact) * [Citation, installation, and license](#citation-installation-and-license) * [Changelog](#changelog) ## Synopsis perl order_fastx.pl -i infile.fasta -l order_id_list.txt > ordered.fasta ## Description Order sequence entries in FASTA or FASTQ sequence files according to an ID list with a given order. Beware, the IDs in the order list have to be **identical** to the entire IDs in the sequence file. However, the ">" or "@" ID identifiers of FASTA or FASTQ files, respectively, can be omitted in the ID list. The file type is detected automatically. But, you can set the file type manually with option **-f**. FASTQ format assumes **four** lines per read, if this is not the case run the FASTQ file through [`fastx_fix.pl`](/fastx_fix) or use Heng Li's [`seqtk seq`](https://github.com/lh3/seqtk): seqtk seq -l 0 infile.fq > outfile.fq The script can also be used to pull a subset of sequences in the ID list from the sequence file. Probably best to set option flag **-s** in this case, see [Optional options](#optional-options) below. But, rather use [`filter_fastx.pl`](/filter_fastx). ## Usage perl order_fastx.pl -i infile.fq -l order_id_list.txt -s -f fastq > ordered.fq perl order_fastx.pl -i infile.fasta -l order_id_list.txt -e > ordered.fasta ## Options ### Mandatory options - -i, -input Input FASTA or FASTQ file - -l, -list List with sequence IDs in specified order ### Optional options - -h, -help Help (perldoc POD) - -f, -file_type Set the file type manually [fasta|fastq] - -e, -error_files Write missing IDs in the seq file or the order ID list without an equivalent in the other to error files instead of *STDERR* (see [Output](#output) below) - -s, -skip_errors Skip missing ID error statements, excludes option **-e** - -v, -version Print version number to *STDERR* ## Output - *STDOUT* The newly ordered sequences are printed to *STDOUT*. Redirect or pipe into another tool as needed. - (order_ids_missing.txt) If IDs in the order list are missing in the sequence file with option **-e** - (seq_ids_missing.txt) If IDs in the sequence file are missing in the order ID list with option **-e** ## Run environment The Perl script runs under Windows and UNIX flavors. ## Author - contact Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster) ## Citation, installation, and license For [citation](https://github.com/aleimba/bac-genomics-scripts#citation), [installation](https://github.com/aleimba/bac-genomics-scripts#installation-recommendations), and [license](https://github.com/aleimba/bac-genomics-scripts#license) information please see the repository main [*README.md*](https://github.com/aleimba/bac-genomics-scripts/blob/master/README.md). ## Changelog - v0.1 (20.11.2014)