6 A collection of scripts intended for **bacterial genomics** (some might also be useful for eukaryotes) from **high-throughput sequencing** (aka next-generation sequencing).
25 * Create a feature table for all annotated primary features in RichSeq (EMBL or GENBANK format) files: [`genomes_feature_table`](/genomes_feature_table)
29 * Calculate stats and plot venn diagrams for genome groups according to orthologs/paralogs from [*Proteinortho5*](http://www.bioinf.uni-leipzig.de/Software/proteinortho/) output, i.e. overall presence/absence statistics for groups of genomes and not simply single genomes: [`po2group_stats`](/po2group_stats)
30 * Strain panel query protein search with **BLASTP** plus concise hit summary, optional alignment, and presence/absence matrix. Also included, scripts to transpose the matrix and calculate overall presence/absence statistics for groups of columns in the matrix: [`prot_finder`](/prot_finder)
35 * Randomly subsample FASTA, FASTQ, or TEXT files with [*reservoir sampling*](https://en.wikipedia.org/wiki/Reservoir_sampling): [`sample_fastx-txt`](/sample_fastx-txt)
37 * Manual curation of annotation in NCBI's TBL format (e.g. from [Prokka](http://www.vicbioinformatics.com/software.prokka.shtml) automatic annotation) in a spreadsheet software: [`tbl2tab`](/tbl2tab)
45 Each script is hosted in its own folder, so that a separate *README.md* can be included for more information. However, all of the Perl scripts include additionally a usage/help text or a comprehensive [POD](http://perldoc.perl.org/perlpod.html) (Plain Old Documentation) by calling the script either without arguments/options or option **-h|-help**.
47 The scripts are only tested under UNIX, some won't run in a Windows environment (because of included UNIX commands). If you are on Windows an alternative might be [Cygwin](http://cygwin.com/).
51 To download the repository, use either the '[Download ZIP](https://github.com/aleimba/bac-genomics-scripts/archive/master.zip)' link after clicking the green 'Clone or download' button at the top or clone the repository with `git`:
55 If there is an update to this GitHub repository (see above [commits](https://github.com/aleimba/bac-genomics-scripts/commits/master) and [releases](https://github.com/aleimba/bac-genomics-scripts/releases)), you can refresh your **local** repository by using the following command **inside** the local folder:
64 the scripts can then be run everywhere on your system. Of course you can just call them directly by prefexing `perl` to the command or a './' for bash wrappers:
72 **Single** scripts can be downloaded as well. For this purpose click on the folder you're interested in and then on the link of the script. There click on the **Raw** button and save this page to a file (without **Raw** you'll get an unusable html file). This is also true for other files (e.g. PDFs etc.).
78 Most of the Perl scripts include modules from [BioPerl](http://www.bioperl.org) as stated in their respective *README.md* or POD, which as a consequence has to be installed on your system. For BioPerl installation instructions see the website ([**Installation**](http://bioperl.org/INSTALL.html)).
80 Some scripts need additional Perl modules, which will be stated in the associated *README.md* or POD. If they're not installed yet on your system get them from [CPAN](http://www.cpan.org/) (installation instructions can be found on the website, see e.g. [**Getting Started...Installing Perl Modules**](http://www.cpan.org/modules/INSTALL.html) or [**FAQ**](http://www.cpan.org/misc/cpan-faq.html#How_install_Perl_modules)).
82 Furthermore, some scripts call upon statistical computing language [**R**](http://www.r-project.org/) and dependent packages for plotting purposes (again see the respective *README.md* or POD).
92 At last, some of the scripts don't like Windows formatted line breaks, you might consider running these input files through a nifty UNIX utility called [dos2unix](http://dos2unix.sourceforge.net/):
97 For now cite the latest major release (tag: [***bovine_ecoli_mastitis***](https://github.com/aleimba/bac-genomics-scripts/releases)) hosted on [Zenodo](https://zenodo.org/):