Galaxy |

CollectRnaSeqMetrics (version 3.1.1.0)

Select SAM/BAM dataset or dataset collection:

If empty, upload or import a SAM/BAM dataset

Load reference genome from:

Using reference genome:

REFERENCE_SEQUENCE

Load gene annotation from:

Gene annotation (GTF/GFF3):

Location of rRNA sequences in genome, in interval_list format:

RIBOSOMAL_INTERVALS; If not specified no bases will be identified as being ribosomal. The list of intervals can be geberated from BED or Interval datasets using Galaxy BedToIntervalList tool

What is the RNA-seq library strand specificity:

STRAND_SPECIFICITY; For unpaired reads, use FIRST_READ_TRANSCRIPTION_STRAND if the reads are expected to be on the transcription strand.

When calculating coverage based values use only use transcripts of this length or greater:

MINIMUM_LENGTH; default=500

Sequences to ignores

Sequences to ignore 0

This percentage of the length of a fragment must overlap one of the ribosomal intervals for a read or read pair to be considered rRNA.:

RRNA_FRAGMENT_PERCENTAGE; default=0.8

The level(s) at which to accumulate metrics:

METRIC_ACCUMULATION_LEVEL

Assume the input file is already sorted:

ASSUME_SORTED

Select validation stringency:

Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

Purpose

Collects metrics about the alignment of RNA to various functional classes of loci in the genome: coding, intronic, UTR, intergenic, ribosomal.

Dataset collections - processing large numbers of datasets at once

This will be added shortly

Obtaining gene annotations in refFlat format

This tool requires gene annotations in refFlat format. These data can be obtained from UCSC table browser directly through Galaxy by following these steps:

Click on Get Data in the upper part of left pane of Galaxy interface

Click on UCSC Main link

Set your genome and dataset of interest. It must be the same genome build against which you have mapped the reads contained in the BAM file you are analyzing

In the output format field choose selected fields from primary and related tables

Click get output button

In the first table presented at the top of the page select (using checkboxes) first 11 fields: name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds proteinId

Click done with selection

Click Send query to Galaxy

A new dataset will appear in the current Galaxy history

Use this dataset as the input for Gene annotations in refFlat form dropdown of this tool

Inputs, outputs, and parameters

Either a SAM file or a BAM file must be supplied. Galaxy automatically coordinate-sorts all uploaded BAM files.

From Picard documentation( http://broadinstitute.github.io/picard/):

REF_FLAT=File Gene annotations in refFlat form. Format described here:
https://genome.ucsc.edu/FAQ/FAQformat.html#format9 Required.

RIBOSOMAL_INTERVALS=File Location of rRNA sequences in genome, in interval_list format. If not specified no bases
will be identified as being ribosomal. Format described here:
https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/util/IntervalList.html and can be
generated from BED datasetes using Galaxy's wrapper for picard_BedToIntervalList tool

STRAND_SPECIFICITY=StrandSpecificity
STRAND=StrandSpecificity For strand-specific library prep. For unpaired reads, use FIRST_READ_TRANSCRIPTION_STRAND
if the reads are expected to be on the transcription strand. Required. Possible values:
{NONE, FIRST_READ_TRANSCRIPTION_STRAND, SECOND_READ_TRANSCRIPTION_STRAND}

MINIMUM_LENGTH=Integer When calculating coverage based values (e.g. CV of coverage) only use transcripts of this
length or greater. Default value: 500.

IGNORE_SEQUENCE=String If a read maps to a sequence specified with this option, all the bases in the read are
counted as ignored bases.

RRNA_FRAGMENT_PERCENTAGE=Double
This percentage of the length of a fragment must overlap one of the ribosomal intervals
for a read or read pair by this must in order to be considered rRNA. Default value: 0.8.

METRIC_ACCUMULATION_LEVEL=MetricAccumulationLevel
LEVEL=MetricAccumulationLevel The level(s) at which to accumulate metrics. Possible values: {ALL_READS, SAMPLE,
LIBRARY, READ_GROUP} This option may be specified 0 or more times.

ASSUME_SORTED=Boolean
AS=Boolean If true (default), then the sort order in the header file will be ignored. Default
value: true. Possible values: {true, false}

Additional information

Additional information about Picard tools is available from Picard web site at http://broadinstitute.github.io/picard/ .