Purpose
Collects metrics about the alignment of RNA to various functional classes of loci in the genome: coding, intronic, UTR, intergenic, ribosomal.
Dataset collections - processing large numbers of datasets at once
This will be added shortly
Obtaining gene annotations in refFlat format
This tool requires gene annotations in refFlat format. These data can be obtained from UCSC table browser directly through Galaxy by following these steps:
- Click on Get Data in the upper part of left pane of Galaxy interface
- Click on UCSC Main link
- Set your genome and dataset of interest. It must be the same genome build against which you have mapped the reads contained in the BAM file you are analyzing
- In the output format field choose selected fields from primary and related tables
- Click get output button
- In the first table presented at the top of the page select (using checkboxes) first 11 fields: name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds proteinId
- Click done with selection
- Click Send query to Galaxy
- A new dataset will appear in the current Galaxy history
- Use this dataset as the input for Gene annotations in refFlat form dropdown of this tool
Inputs, outputs, and parameters
Either a SAM file or a BAM file must be supplied. Galaxy automatically coordinate-sorts all uploaded BAM files.
From Picard documentation( http://broadinstitute.github.io/picard/):
REF_FLAT=File Gene annotations in refFlat form. Format described here: https://genome.ucsc.edu/FAQ/FAQformat.html#format9 Required. RIBOSOMAL_INTERVALS=File Location of rRNA sequences in genome, in interval_list format. If not specified no bases will be identified as being ribosomal. Format described here: https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/util/IntervalList.html and can be generated from BED datasetes using Galaxy's wrapper for picard_BedToIntervalList tool STRAND_SPECIFICITY=StrandSpecificity STRAND=StrandSpecificity For strand-specific library prep. For unpaired reads, use FIRST_READ_TRANSCRIPTION_STRAND if the reads are expected to be on the transcription strand. Required. Possible values: {NONE, FIRST_READ_TRANSCRIPTION_STRAND, SECOND_READ_TRANSCRIPTION_STRAND} MINIMUM_LENGTH=Integer When calculating coverage based values (e.g. CV of coverage) only use transcripts of this length or greater. Default value: 500. IGNORE_SEQUENCE=String If a read maps to a sequence specified with this option, all the bases in the read are counted as ignored bases. RRNA_FRAGMENT_PERCENTAGE=Double This percentage of the length of a fragment must overlap one of the ribosomal intervals for a read or read pair by this must in order to be considered rRNA. Default value: 0.8. METRIC_ACCUMULATION_LEVEL=MetricAccumulationLevel LEVEL=MetricAccumulationLevel The level(s) at which to accumulate metrics. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. ASSUME_SORTED=Boolean AS=Boolean If true (default), then the sort order in the header file will be ignored. Default value: true. Possible values: {true, false}
Additional information
Additional information about Picard tools is available from Picard web site at http://broadinstitute.github.io/picard/ .