A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
Report generated on 2018-04-06, 18:29 based on data in:
/Users/bebatut/Documents/galaxy/tools/tools-iuc/tools/multiqc/multiqc_WDir
General Statistics
Showing 21/21 rows and 31/34 columns.Sample Name | M Reads Mapped | N50 (Kbp) | Length (Mbp) | Change rate | Ts/Tv | M Variants | TiTV ratio (known) | TiTV ratio (novel) | % Assigned | M Assigned | Vars | SNP | Indel | Ts/Tv | M Assigned | % rRNA | % mRNA | % Aligned | Insert Size | % Dups | Organism | Contigs | CDS | % Dups | Error rate | M Non-Primary | M Reads Mapped | % Mapped | M Total seqs | % Duplicates | % Mapped |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
14892_1#15 | 115.1Kbp | 18.4Mbp | |||||||||||||||||||||||||||||
70: TopHat on data 1, data 4, and data 3: accepted_hits | 70.8% | 0.3 | |||||||||||||||||||||||||||||
75: TopHat on data 1, data 6, and data 5: accepted_hits | 69.6% | 0.4 | |||||||||||||||||||||||||||||
80: TopHat on data 1, data 8, and data 7: accepted_hits | 71.8% | 0.4 | |||||||||||||||||||||||||||||
85: TopHat on data 1, data 10, and data 9: accepted_hits | 72.0% | 0.4 | |||||||||||||||||||||||||||||
90: TopHat on data 1, data 12, and data 11: accepted_hits | 71.3% | 0.4 | |||||||||||||||||||||||||||||
95: TopHat on data 1, data 14, and data 13: accepted_hits | 70.7% | 0.5 | |||||||||||||||||||||||||||||
D11_H4K16ac_Rep1_R1_fastq_gz | 98% | ||||||||||||||||||||||||||||||
Sample1 | Helicobacter pylori | 30.0 | 1548 | ||||||||||||||||||||||||||||
Sample2 | Escherichia coli | 52.0 | 1548 | ||||||||||||||||||||||||||||
Test1 | 5522770 | 4474244 | 902934 | 1.97 | |||||||||||||||||||||||||||
bamtools | 0.0% | 93.1% | |||||||||||||||||||||||||||||
dataset_114 | 0.6% | ||||||||||||||||||||||||||||||
dataset_197 | 176 bp | ||||||||||||||||||||||||||||||
gatk_varianteval | 0.0 | 2.2 | |||||||||||||||||||||||||||||
htseq | 0.0% | 0.0 | |||||||||||||||||||||||||||||
picard_CollectRnaSeqMetrics_bam | % | 79.6% | |||||||||||||||||||||||||||||
samtools_flagstat | 20.7 | ||||||||||||||||||||||||||||||
samtools_stats | 0.42% | 0.0 | 0.6 | 100.0% | 0.6 | ||||||||||||||||||||||||||
snpeff | 3190 | 0.000 | 0.97 | ||||||||||||||||||||||||||||
virtual-normal | 1.3% |
QUAST
QUAST is a quality assessment tool for genome assemblies, written by the Center for Algorithmic Biotechnology.
Assembly Statistics
Sample Name | N50 (Kbp) | N75 (Kbp) | L50 (K) | L75 (K) | Largest contig (Kbp) | Length (Mbp) | Misassemblies | Mismatches/100kbp | Indels/100kbp | Genes | Genes (Partial) | Genome Fraction |
---|---|---|---|---|---|---|---|---|---|---|---|---|
14892_1#15 | 115.1Kbp | 62.3Kbp | 0.0K | 100.0K | 435.5Kbp | 18.4Mbp | 107.0 | 262.21 | 33.75 | 7280 | 968 | 95.5% |
Number of Contigs
This plot shows the number of contigs found for each assembly, broken down by length.
RSeQC
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput RNA-seq data.
BUSCO
BUSCO assesses genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs.
Lineage: fungi_odb9
deepTools
deepTools is a suite of tools to process and analyze deep sequencing data.
Filtering metrics
Estimated percentages of alignments filtered independently for each setting in estimateReadFiltering
Sample Name | M entries | % Aligned | % Tot. Filtered | % Blacklisted | % Missing Flags | % Forbidden Flags | % deepTools Dupes | % Duplication | % Singletons | % Strand Filtered |
---|---|---|---|---|---|---|---|---|---|---|
bismark_se.pbat | 30.9 | 100.0 | 44.1 | 0.0 | 0.0 | 0.0 | 43.4 | 0.0 | 0.0 | 5.0 |
bwameth_se.pbat | 71.4 | 64.7 | 19.8 | 0.0 | 0.0 | 0.0 | 28.4 | 0.0 | 0.0 | 6.4 |
Coverage metrics
Sample Name | Min | 1st Quartile | Median | Mean | 3rd Quartile | Max | Std. Dev. |
---|---|---|---|---|---|---|---|
bismark_se.pbat | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 959.0 | 16.5 |
bwameth_se.pbat | 0.0 | 0.0 | 0.0 | 4.5 | 0.0 | 2589.0 | 23.4 |
Coverage distribution
The fraction of bases with a given number of read/fragment coverage
Read lengths
Sample Name | # Sampled | Min | 1st Quartile | Mean | Median | 3rd Quartile | Max | Std. Dev. | MAD |
---|---|---|---|---|---|---|---|---|---|
bismark_se.pbat | 34118 | 21 | 101 | 99.6 | 101 | 101 | 101 | 7.1 | 0.0 |
bwameth_se.pbat | 51142 | 40 | 101 | 98.0 | 101 | 101 | 101 | 10.9 | 0.0 |
dnmt1MUT_mat2aMUT_1_RGi | 15504 | 148 | 148 | 148.0 | 148 | 148 | 148 | 0.0 | 0.0 |
dnmt1MUT_mat2aMUT_2_RGi | 17327 | 148 | 148 | 148.0 | 148 | 148 | 148 | 0.0 | 0.0 |
dnmt1MUT_mat2aMUT_3_RGi | 15266 | 102 | 148 | 148.0 | 148 | 148 | 148 | 0.4 | 0.0 |
Fragment lengths
Sample Name | # Sampled | Min | 1st Quartile | Mean | Median | 3rd Quartile | Max | Std. Dev. | MAD |
---|---|---|---|---|---|---|---|---|---|
dnmt1MUT_mat2aMUT_1_RGi | 15504 | 174 | 185 | 185 | 185 | 185 | 186 | 0.2 | 0.0 |
dnmt1MUT_mat2aMUT_2_RGi | 17327 | 176 | 185 | 185 | 185 | 185 | 186 | 0.2 | 0.0 |
dnmt1MUT_mat2aMUT_3_RGi | 15266 | 104 | 185 | 185 | 185 | 185 | 186 | 0.7 | 0.0 |
Read/fragment length distribution
Signal enrichment per feature
Signal enrichment per feature according to plotEnrichment
Fingerprint
Signal fingerprint according to plotFingerprint
SnpEff
SnpEff is a genetic variant annotation and effect prediction toolbox. It annotates and predicts the effects of variants on genes (such as amino acid changes).
Variants by Genomic Region
The stacked bar plot shows locations of detected variants in the genome and the number of variants for each location.
The upstream and downstream interval size to detect these genomic regions is 5000bp by default.
Variant Effects by Impact
The stacked bar plot shows the putative impact of detected variants and the number of variants for each impact.
There are four levels of impacts predicted by SnpEff:
- High: High impact (like stop codon)
- Moderate: Middle impact (like same type of amino acid substitution)
- Low: Low impact (ie silence mutation)
- Modifier: No impact
Variant Effects by Class
The stacked bar plot shows the effect of variants at protein level and the number of variants for each effect type.
This plot shows the effect of variants on the translation of the mRNA as protein. There are three possible cases:
- Silent: The amino acid does not change.
- Missense: The amino acid is different.
- Nonsense: The variant generates a stop codon.
Error - was not able to plot data.
Variant Qualities
The line plot shows the quantity as function of the variant quality score.
The quality score corresponds to the QUAL column of the VCF file. This score is set by the variant caller.
GATK
GATK is a toolkit offering a wide variety of tools with a primary focus on variant discovery and genotyping.
Observed Quality Scores
This plot shows the distribution of base quality scores in each sample before and after base quality score recalibration (BQSR). Applying BQSR should broaden the distribution of base quality scores.
For more information see the Broad's description of BQSR.
Variant Counts
Compare Overlap
Sample Name |
---|
HTSeq Count
HTSeq Count is part of the HTSeq Python package - it takes a file with aligned sequencing reads, plus a list of genomic features and counts how many reads map to each feature.
Bcftools
Bcftools contains utilities for variant calling and manipulating VCFs and BCFs.
Variant Substitution Types
Indel Distribution
Variant depths
Read depth support distribution for called variants
featureCounts
Subread featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.
Picard
Picard is a set of Java command line tools for manipulating high-throughput sequencing data.
Alignment Summary
Plase note that Picard's read counts are divided by two for paired-end data.
Base Distribution
Plot shows the distribution of bases by cycle.
GC Coverage Bias
This plot shows bias in coverage across regions of the genome with varying GC content. A perfect library would be a flat line at y = 1
.
Insert Size
Plot shows the number of reads at a given insert size. Reads with different orientations are summed.
Mark Duplicates
RnaSeqMetrics Assignment
Number of bases in primary alignments that align to regions in the reference genome.
Gene Coverage
Prokka
Prokka is a software tool for the rapid annotation of prokaryotic genomes.
This barplot shows the distribution of different types of features found in each contig.
Prokka
can detect different features:
- CDS
- rRNA
- tmRNA
- tRNA
- miscRNA
- signal peptides
- CRISPR arrays
This barplot shows you the distribution of these different types of features found in each contig.
Samblaster
Samblaster is a tool to mark duplicates and extract discordant and split reads from sam files.
Samtools
Samtools is a suite of programs for interacting with high-throughput sequencing data.
Percent Mapped
Alignment metrics from samtools stats
; mapped vs. unmapped reads.
For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.
Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).
Alignment metrics
This module parses the output from samtools stats
. All numbers in millions.
Samtools Flagstat
This module parses the output from samtools flagstat
. All numbers in millions.
Bamtools
Bamtools provides both a programmer's API and an end-user's toolkit for handling BAM files.
Bamtools Stats
VCFTools
VCFTools is a program for working with and reporting on VCF files.
TsTv by Qual
Plot of TSTV-BY-QUAL
- the transition to transversion ratio as a function of SNP quality from the output of vcftools TsTv-by-qual.
Transition
is a purine-to-purine or pyrimidine-to-pyrimidine point mutations.
Transversion
is a purine-to-pyrimidine or pyrimidine-to-purine point mutation.
Quality
here is the Phred-scaled quality score as given in the QUAL column of VCF.
Note: only bi-allelic SNPs are used (multi-allelic sites and INDELs are skipped.)
Refer to Vcftools's manual (https://vcftools.github.io/man_latest.html) on --TsTv-by-qual