Mercurial > repos > victor > rsem
view rsem/README @ 1:560aaa69b532 default tip
Uploaded added tool_data_conf file
author | victor |
---|---|
date | Mon, 05 Mar 2012 11:19:17 -0500 |
parents | 4edac0183857 |
children |
line wrap: on
line source
# RSEM Galaxy Wrapper # ## Introduction ## RSEM (RNA-Seq by Expectation-Maximization) is a software package for the estimation of gene and isoform abundances from RNA-Seq data. A key feature of RSEM is its statistically-principled approach to the handling of RNA-Seq reads that map to multiple genes and/or isoforms. In addition, RSEM is well-suited to performing quantification with de novo transcriptome assemblies, as it does not require a reference genome. ## Installation ## Follow the [Galaxy Tool Shed instructions](http://wiki.g2.bx.psu.edu/Tool_Shed) to add this wrapper from the tool shed to your galaxy instance. Once the files are in the tools directory you have to have RSEM references installed. This can be done by: 1. Placing the file called `rsem_indices.loc` into the directory `~/galaxy-dist/tool-data` This file tells the RSEM wrapper how to find the reference(s). It is formatted according to galaxy's documentation with the following tab-delimited format: unique_build_id dbkey display_name file_base_path For example, human_refseq_NM human_refseq_NM human_refseq_NM /opt/galaxy/references/human/1.1.2/NM_refseq_ref 2. Downloaded a pre-built RSEM reference from the [RSEM website](http://deweylab.biostat.wisc.edu/rsem/). 3. Place reference files into the `file_base_path` listed in the `rsem_indices.loc` file If you would rather build your own reference files follow the instructions below and then place resulting reference files into the `file_base_path` listed in the `rsem_indices.loc` file. ### Building a custom RSEM reference ### For instructions on how to build the RSEM reference files, first see the [RSEM documentation](http://deweylab.biostat.wisc.edu/rsem/README.html). #### Example #### Suppose we have mouse RNA-Seq data and want to use the UCSC mm9 version of the mouse genome. We have downloaded the UCSC Genes transcript annotations in GTF format (as mm9.gtf) using the Table Browser and the knownIsoforms.txt file for mm9 from the UCSC Downloads. We also have all chromosome files for mm9 in the directory `/data/mm9`. We want to put the generated reference files under `/opt/galaxy/references` with name `mouse_125`. We'll add poly(A) tails with length 125. Please note that GTF files generated from UCSC's Table Browser do not contain isoform-gene relationship information. For the UCSC Genes annotation, this information can be obtained from the knownIsoforms.txt file. Suppose we want to build Bowtie indices and Bowtie executables are found in `/sw/bowtie`. To build the reference files, first run the command: rsem-prepare-reference --gtf mm9.gtf \ --transcript-to-gene-map knownIsoforms.txt \ --bowtie-path /sw/bowtie \ /data/mm9/chr1.fa,/data/mm9/chr2.fa,...,/data/mm9/chrM.fa \ /opt/galaxy/references/mouse_125 To add this reference to your galaxy installation, add the following line to the the `rsem_indices.loc` file: mouse_125 mouse_125 mouse_125 /opt/galaxy/references/mouse_125 Then restart galaxy and you should see the `mouse_125` reference listed in the RSEM wrapper. ## References ## * [RSEM website (stand alone package)](http://deweylab.biostat.wisc.edu/rsem/) * B. Li and C. Dewey (2011) [RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome](http://bioinformatics.oxfordjournals.org/content/26/4/493.abstract). BMC Bioinformatics 12:323. * B. Li, V. Ruotti, R. Stewart, J. Thomson, and C. Dewey (2010) [RNA-Seq gene expression estimation with read mapping uncertainty](http://www.biomedcentral.com/1471-2105/12/323). Bioinformatics 26(4): 493-500. ## Contact information ## * RSEM galaxy wrapper questions: ruotti@wisc.edu * RSEM stand alone package questions: bli@cs.wisc.edu * [RSEM announcements mailing list](http://groups.google.com/group/rsem-announce) * [RSEM users mailing list](http://groups.google.com/group/rsem-users)