Mercurial > repos > dereeper > pangenome_explorer
comparison COG/bac-genomics-scripts/calc_fastq-stats/README.md @ 3:e42d30da7a74 draft
Uploaded
author | dereeper |
---|---|
date | Thu, 30 May 2024 11:52:25 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
2:97e4e3e818b6 | 3:e42d30da7a74 |
---|---|
1 calc_fastq-stats | |
2 ================ | |
3 | |
4 `calc_fastq-stats.pl` is a script to calculate basic statistics for bases and reads in a FASTQ file. | |
5 | |
6 * [Synopsis](#synopsis) | |
7 * [Description](#description) | |
8 * [Usage](#usage) | |
9 * [Options](#options) | |
10 * [Mandatory options](#mandatory-options) | |
11 * [Optional options](#optional-options) | |
12 * [Output](#output) | |
13 * [Run environment](#run-environment) | |
14 * [Dependencies](#dependencies) | |
15 * [Author - contact](#author---contact) | |
16 * [Citation, installation, and license](#citation-installation-and-license) | |
17 * [Changelog](#changelog) | |
18 | |
19 ## Synopsis | |
20 | |
21 perl calc_fastq-stats.pl -i reads.fastq | |
22 | |
23 **or** | |
24 | |
25 gzip -dc reads.fastq.gz | perl calc_fastq-stats.pl -i - | |
26 | |
27 ## Description | |
28 | |
29 The script calculates some simple statistics, like individual and total base | |
30 counts, GC content, and basic stats for the read lengths, and | |
31 read/base qualities in a FASTQ file. The GC content calculation does | |
32 not include 'N's. Stats are printed to *STDOUT* and optionally to an | |
33 output file. | |
34 | |
35 Because the quality of a read degrades over its length with all NGS | |
36 machines, it is advisable to also plot the quality for each cycle as | |
37 implemented in tools like | |
38 [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) | |
39 or the [fastx-toolkit](http://hannonlab.cshl.edu/fastx_toolkit/). | |
40 | |
41 If the sequence and the quality values are interrupted by line | |
42 breaks (i.e. a read is **not** represented by four lines), please fix | |
43 with Heng Li's [seqtk](https://github.com/lh3/seqtk): | |
44 | |
45 seqtk seq -l 0 infile.fastq > outfile.fastq | |
46 | |
47 An alternative tool, which is a lot faster, is **fastq-stats** from | |
48 [ea-utils](https://code.google.com/p/ea-utils/). | |
49 | |
50 ## Usage | |
51 | |
52 zcat reads.fastq.gz | perl calc_fastq-stats.pl -i - -q 64 -c 175000000 -n 3000000 | |
53 | |
54 ## Options | |
55 | |
56 ### Mandatory options | |
57 | |
58 - -i, -input | |
59 | |
60 Input FASTQ file or piped STDIN (-) from a gzipped file | |
61 | |
62 - -q, -qual_offset | |
63 | |
64 ASCII quality offset of the Phred (Sanger) quality values [default 33] | |
65 | |
66 ### Optional options | |
67 | |
68 - -h, -help: | |
69 | |
70 Help (perldoc POD) | |
71 | |
72 - -c, -coverage_limit | |
73 | |
74 Number of bases to sample from the top of the file | |
75 | |
76 - -n, -num_read | |
77 | |
78 Number of reads to sample from the top of the file | |
79 | |
80 - -o, -output | |
81 | |
82 Print stats in addition to *STDOUT* to the specified output file | |
83 | |
84 - -v, -version | |
85 | |
86 Print version number to *STDERR* | |
87 | |
88 ## Output | |
89 | |
90 - *STDOUT* | |
91 | |
92 Calculated stats are printed to *STDOUT* | |
93 | |
94 - (outfile) | |
95 | |
96 Optional outfile for stats | |
97 | |
98 ## Run environment | |
99 | |
100 The Perl script runs under Windows and UNIX flavors. | |
101 | |
102 ## Dependencies | |
103 | |
104 If the following modules are not installed get them from | |
105 [CPAN](http://www.cpan.org/): | |
106 | |
107 - `Statistics::Descriptive` | |
108 | |
109 Perl module to calculate basic descriptive statistics | |
110 | |
111 - `Statistics::Descriptive::Discrete` | |
112 | |
113 Perl module to calculate descriptive statistics for discrete data sets | |
114 | |
115 - `Statistics::Descriptive::Weighted` | |
116 | |
117 Perl module to calculate descriptive statistics for weighted variates | |
118 | |
119 ## Author - contact | |
120 | |
121 Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster) | |
122 | |
123 ## Citation, installation, and license | |
124 | |
125 For [citation](https://github.com/aleimba/bac-genomics-scripts#citation), [installation](https://github.com/aleimba/bac-genomics-scripts#installation-recommendations), and [license](https://github.com/aleimba/bac-genomics-scripts#license) information please see the repository main [*README.md*](https://github.com/aleimba/bac-genomics-scripts/blob/master/README.md). | |
126 | |
127 ## Changelog | |
128 | |
129 - v0.1 (28.10.2014) |