**What it does**

QualiMap RNA-seq QC reports quality control metrics and bias estimations which are
specific for whole transcriptome sequencing, including reads genomic origin, junction
analysis, transcript coverage and 5'-3' bias computation.


**Counting mode**

The tool can count either *reads* or *fragments* (the latter is specific to
paired-end sequencing). Fragment counting means that only one count is added for a
pair of reads.

For paired-end data, you will usually want to choose *Count fragments*, but note
that this requires the input BAM dataset to be coordinate-sorted by read names
(instead of the more common coordinate-sorting by genomic position).

For single-end data, the two counting modes are equivalent and you should choose
*Count reads*.


**Genome annotation data**

The genome annotation provided to the tool should be in GTF format. The tool uses
the *gene_id* attribute of annotated features to group counts.


**Strandedness**

The strandedness setting is used to decide which read or fragment counts to add to
which gene. The default setting is to count reads/fragments independent of
strandedness, but if your sequencing protocol is strand-specific, you should
indicate this here.

For strand-specific protocols, only reads/fragments that are on the expected strand
will be counted. For paired-end data, this means that the two reads of a pair are
expected to be on opposite strands, with the first read on the strand opposite to
that of the gene, and the second read on the same strand as the gene
(forward-stranded protocol), or *vice versa* (reverse-stranded protocol).


**Multimapping reads**

Reads that map to multiple locations in the genome can be counted proportionally to
the number of locations they map to (as indicated by the ``NH`` tag in the BAM
input), or they can be ignored.


**Outputs**

The tool generates three outputs:

- an HTML report with

  - plots of the following statistics:

    - *Reads Genomic Origin*
    - *Coverage Profile Along Genes (Total)*
    - *Coverage Profile Along Genes (Low)*
    - *Coverage Profile Along Genes (High)*
    - *Coverage Histogram (0-50x)*
    - *Junction Analysis*

  - a *Summary* section with detailed statistics

- a collection of the raw data underlying the plots and the summary

- optionally, a counts dataset with per-gene counts


**HTML Report**

*Summary*

- reads aligned

  number of reads (or fragments) that are aligned to the reference; this number
  includes secondary alignments

- total alignments

  total number of alignment records in the BAM input; this number includes secondary
  alignments

- secondary alignments

  number of secondary alignment records in the BAM input

- non-unique alignments

  number of alignment records with an ``NH`` tag greater than one;
  corresponds to the number of alignments that will have been skipped during
  counting when *Count uniquely mapped reads only* is selected

- number of reads aligned to genes

- number of ambiguous alignments

  This is the number of mapped reads that span multiple annotated genes.
  Such reads are always skipped during counting.

- no feature assigned

  reports the number of alignments that are not overlapping any annotated
  feature; these may represent alignments to introns or intergenic regions, or,
  if the number is really high, may indicate a problem with your genome
  annotations

- not aligned

  number of reads not mapped by the aligner (but included in the BAM input)

- strand specificity estimation (fwd/rev)

  computed if *Count reads/fragments independent of strandedness* is selected;
  estimate of the proportion of alignments in line with forward- and reverse-
  strand-specificitiy of the sequencing library

  Balanced proportions (*i.e.* ~ 0.5 forward- and ~ 0.5 reverse-strand support)
  can be interpreted as likely non-strand-specificity of the sequencing library,
  while a strand-specific library would manifest itself in a large fraction of
  reads supporting that specific strand-specificity.

*Reads genomic origin*

Lists how many alignments (absolute number/fraction) fall into

- exonic,
- intronic,
- intergenic

regions, or are at least

- overlapping an exon.

*Transcript coverage profile*

The profile provides ratios between mean coverage of 5' regions, 3' regions and whole transcripts.

- 5' bias

  the ratio of coverage median of 5' regions (defined as the first 100 nts) to whole transcripts

- 3' bias

  the ratio of coverage median of 3' regions (defined as the last 100 nts) to whole transcripts

- 5'-3' bias

  the ratio of 5' bias to 3' bias.

*Junction analysis*

Lists the total number of reads with splice junctions and the relative
frequency of the (up to) 10 most frequent junction sequences.


**Plots**

*Reads Genomic Origin*

A pie chart showing how many read alignments fall into exonic, intronic and
intergenic regions.

*Coverage Profile Along Genes (Total)*

This plot shows the mean coverage profile of all genes with non-zero
overall coverage.

*Coverage Profile Along Genes (Low)*

The plot shows the mean coverage profile of the 500 genes with the lowest, but non-zero overall coverage.

*Coverage Profile Along Genes (High)*

The plot shows the mean coverage profile of the 500 genes with the highest
overall coverage.

*Coverage Histogram (0-50x)*

Coverage of genes from 0 to 50x. Genes with >50x coverage are added to the 50x
bin.

*Junction Analysis*

This pie chart shows an analysis of the splice junctions observed in the
alignments. It consists of three categories:

- Known

  observed splice junctions both sides of which are in line with the genome
  annotation data

- Partly known

  observed splice junctions for which only one junction side can be deduced
  from the genome annotation data

- Novel

  observed splice junctions not predicted on either side by the genome
  annotation data


Raw data
--------

This is a *Collection* of 4 individual datasets.

Of these, the *rnaseq_qc_results* dataset provides a plain-text version of the
*HTML report* *Summary* section.

The other 3 datasets hold the tabular raw data underlying the three coverage
profile plots in the *HTML Report*.


Counts data
-----------

Optional. This is a 2-column tabular dataset of read or fragment counts
(depending on the chosen *Counting mode*) per annotated gene. The first column
lists the gene identifiers found in the *Genome annotation data*, the second
the associated counts.

This dataset represents valid (single-sample) input for the QualiMap Counts QC
tool. the proportion of alignments in line with forward- and reverse-\n+  strand-specificitiy of the sequencing library\n+\n+  Balanced proportions (*i.e.* ~ 0.5 forward- and ~ 0.5 reverse-strand support)\n+  can be interpreted as likely non-strand-specificity of the sequencing library,\n+  while a strand-specific library would manifest itself in a large fraction of\n+  reads supporting that specific strand-specificity.\n+\n+*Reads genomic origin*\n+\n+Lists how many alignments (absolute number/fraction) fall into\n+\n+- exonic,\n+- intronic,\n+- intergenic\n+\n+regions, or are at least\n+\n+- overlapping an exon.\n+\n+*Transcript coverage profile*\n+\n+The profile provides ratios between mean coverage of 5\xe2\x80\x99 regions, 3\xe2\x80\x99 regions and whole transcripts.\n+\n+- 5\xe2\x80\x99 bias\n+\n+  the ratio of coverage median of 5\xe2\x80\x99 regions (defined as the first 100 nts) to whole transcripts\n+\n+- 3\' bias\n+\n+  the ratio of coverage median of 3\xe2\x80\x99 regions (defined as the last 100 nts) to whole transcripts\n+\n+- 5\xe2\x80\x99-3\xe2\x80\x99 bias\n+\n+  the ratio of 5\' bias to 3\' bias.\n+\n+*Junction analysis*\n+\n+Lists the total number of reads with splice junctions and the relative\n+frequency of the (up to) 10 most frequent junction sequences.\n+\n+\n+**Plots**\n+\n+*Reads Genomic Origin*\n+\n+A pie chart showing how many read alignments fall into exonic, intronic and\n+intergenic regions.\n+\n+*Coverage Profile Along Genes (Total)*\n+\n+This plot shows the mean coverage profile of all genes with non-zero\n+overall coverage.\n+\n+*Coverage Profile Along Genes (Low)*\n+\n+The plot shows the mean coverage profile of the 500 genes with the lowest, but non-zero overall coverage.\n+\n+*Coverage Profile Along Genes (High)*\n+\n+The plot shows the mean coverage profile of the 500 genes with the highest\n+overall coverage.\n+\n+*Coverage Histogram (0-50x)*\n+\n+Coverage of genes from 0 to 50x. Genes with >50x coverage are added to the 50x\n+bin.\n+\n+*Junction Analysis*\n+\n+This pie chart shows an analysis of the splice junctions observed in the\n+alignments. It consists of three categories:\n+\n+- Known\n+\n+  observed splice junctions both sides of which are in line with the genome\n+  annotation data\n+\n+- Partly known\n+\n+  observed splice junctions for which only one junction side can be deduced\n+  from the genome annotation data\n+\n+- Novel\n+\n+  observed splice junctions not predicted on either side by the genome\n+  annotation data\n+\n+\n+Raw data\n+--------\n+\n+This is a *Collection* of 4 individual datasets.\n+\n+Of these, the *rnaseq_qc_results* dataset provides a plain-text version of the\n+*HTML report* *Summary* section.\n+\n+The other 3 datasets hold the tabular raw data underlying the three coverage\n+profile plots in the *HTML Report*.\n+\n+\n+Counts data\n+-----------\n+\n+Optional. This is a 2-column tabular dataset of read or fragment counts\n+(depending on the chosen *Counting mode*) per annotated gene. The first column\n+lists the gene identifiers found in the *Genome annotation data*, the second\n+the associated counts.\n+\n+This dataset represents valid (single-sample) input for the QualiMap Counts QC\n+tool.\n+    ]]></help>\n+    <expand macro="citations"/>\n+</tool>\n'
