Collects metrics about the alignment of RNA to various functional classes of loci in the genome: coding, intronic, UTR, intergenic, ribosomal.
Obtaining gene annotations in refFlat format
This tool requires gene annotations in refFlat format. These data can be obtained from UCSC table browser directly through Galaxy by following these steps:
- Click on Get Data in the upper part of left pane of Galaxy interface
- Click on UCSC Main link
- Set your genome and dataset of interest. It must be the same genome build against which you have mapped the reads contained in the BAM file you are analyzing
- In the output format field choose selected fields from primary and related tables
- Click get output button
- In the first table presented at the top of the page select (using checkboxes) first 11 fields: name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds proteinId
- Click done with selection
- Click Send query to Galaxy
- A new dataset will appear in the current Galaxy history
- Use this dataset as the input for Gene annotations in refFlat form dropdown of this tool
Inputs, outputs, and parameters
Either a SAM file or a BAM file must be supplied. Galaxy automatically coordinate-sorts all uploaded BAM files.
From Picard documentation( http://broadinstitute.github.io/picard/):
REF_FLAT=File Gene annotations in refFlat form. Format described here: https://genome.ucsc.edu/FAQ/FAQformat.html#format9 Required. RIBOSOMAL_INTERVALS=File Location of rRNA sequences in genome, in interval_list format. If not specified no bases will be identified as being ribosomal. Format described here: https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/util/IntervalList.html and can be generated from BED datasetes using Galaxy's wrapper for picard_BedToIntervalList tool STRAND_SPECIFICITY=StrandSpecificity STRAND=StrandSpecificity For strand-specific library prep. For unpaired reads, use FIRST_READ_TRANSCRIPTION_STRAND if the reads are expected to be on the transcription strand. Required. Possible values: {NONE, FIRST_READ_TRANSCRIPTION_STRAND, SECOND_READ_TRANSCRIPTION_STRAND} MINIMUM_LENGTH=Integer When calculating coverage based values (e.g. CV of coverage) only use transcripts of this length or greater. Default value: 500. IGNORE_SEQUENCE=String If a read maps to a sequence specified with this option, all the bases in the read are counted as ignored bases. RRNA_FRAGMENT_PERCENTAGE=Double This percentage of the length of a fragment must overlap one of the ribosomal intervals for a read or read pair by this must in order to be considered rRNA. Default value: 0.8. METRIC_ACCUMULATION_LEVEL=MetricAccumulationLevel LEVEL=MetricAccumulationLevel The level(s) at which to accumulate metrics. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. ASSUME_SORTED=Boolean AS=Boolean If true (default), then the sort order in the header file will be ignored. Default value: true. Possible values: {true, false}
Additional information
Additional information about Picard tools is available from Picard web site at http://broadinstitute.github.io/picard/ .