0
|
1 # RSEM Galaxy Wrapper #
|
|
2
|
|
3 ## Introduction ##
|
|
4
|
|
5 RSEM (RNA-Seq by Expectation-Maximization) is a software package for the
|
|
6 estimation of gene and isoform abundances from RNA-Seq data. A key feature of
|
|
7 RSEM is its statistically-principled approach to the handling of RNA-Seq
|
|
8 reads that map to multiple genes and/or isoforms. In addition, RSEM is
|
|
9 well-suited to performing quantification with de novo transcriptome
|
|
10 assemblies, as it does not require a reference genome.
|
|
11
|
|
12 ## Installation ##
|
|
13
|
|
14 Follow the [Galaxy Tool Shed
|
|
15 instructions](http://wiki.g2.bx.psu.edu/Tool_Shed) to add this wrapper from
|
|
16 the tool shed to your galaxy instance. Once the files are in the tools
|
|
17 directory you have to have RSEM references installed. This can be done by:
|
|
18
|
|
19 1. Placing the file called `rsem_indices.loc` into the directory
|
|
20 `~/galaxy-dist/tool-data` This file tells the RSEM wrapper how to find the
|
|
21 reference(s). It is formatted according to galaxy's documentation with the
|
|
22 following tab-delimited format:
|
|
23
|
|
24 unique_build_id dbkey display_name file_base_path
|
|
25
|
|
26 For example,
|
|
27
|
|
28 human_refseq_NM human_refseq_NM human_refseq_NM /opt/galaxy/references/human/1.1.2/NM_refseq_ref
|
|
29
|
|
30 2. Downloaded a pre-built RSEM reference from the [RSEM website](http://deweylab.biostat.wisc.edu/rsem/).
|
|
31
|
|
32 3. Place reference files into the `file_base_path` listed in the
|
|
33 `rsem_indices.loc` file
|
|
34
|
|
35 If you would rather build your own reference files follow the instructions
|
|
36 below and then place resulting reference files into the `file_base_path` listed
|
|
37 in the `rsem_indices.loc` file.
|
|
38
|
|
39 ### Building a custom RSEM reference ###
|
|
40
|
|
41 For instructions on how to build the RSEM reference files, first see the [RSEM
|
|
42 documentation](http://deweylab.biostat.wisc.edu/rsem/README.html).
|
|
43
|
|
44 #### Example ####
|
|
45
|
|
46 Suppose we have mouse RNA-Seq data and want to use the UCSC mm9 version of the
|
|
47 mouse genome. We have downloaded the UCSC Genes transcript annotations in GTF
|
|
48 format (as mm9.gtf) using the Table Browser and the knownIsoforms.txt file for
|
|
49 mm9 from the UCSC Downloads. We also have all chromosome files for mm9 in the
|
|
50 directory `/data/mm9`. We want to put the generated reference files under
|
|
51 `/opt/galaxy/references` with name `mouse_125`. We'll add poly(A) tails with
|
|
52 length 125. Please note that GTF files generated from UCSC's Table Browser do
|
|
53 not contain isoform-gene relationship information. For the UCSC Genes
|
|
54 annotation, this information can be obtained from the knownIsoforms.txt file.
|
|
55 Suppose we want to build Bowtie indices and Bowtie executables are found in
|
|
56 `/sw/bowtie`.
|
|
57
|
|
58 To build the reference files, first run the command:
|
|
59
|
|
60 rsem-prepare-reference --gtf mm9.gtf \
|
|
61 --transcript-to-gene-map knownIsoforms.txt \
|
|
62 --bowtie-path /sw/bowtie \
|
|
63 /data/mm9/chr1.fa,/data/mm9/chr2.fa,...,/data/mm9/chrM.fa \
|
|
64 /opt/galaxy/references/mouse_125
|
|
65
|
|
66 To add this reference to your galaxy installation, add the following line to
|
|
67 the the `rsem_indices.loc` file:
|
|
68
|
|
69 mouse_125 mouse_125 mouse_125 /opt/galaxy/references/mouse_125
|
|
70
|
|
71 Then restart galaxy and you should see the `mouse_125` reference listed in the
|
|
72 RSEM wrapper.
|
|
73
|
|
74 ## References ##
|
|
75
|
|
76 * [RSEM website (stand alone package)](http://deweylab.biostat.wisc.edu/rsem/)
|
|
77
|
|
78 * B. Li and C. Dewey (2011) [RSEM: accurate transcript quantification from
|
|
79 RNA-Seq data with or without a reference
|
|
80 genome](http://bioinformatics.oxfordjournals.org/content/26/4/493.abstract).
|
|
81 BMC Bioinformatics 12:323.
|
|
82
|
|
83 * B. Li, V. Ruotti, R. Stewart, J. Thomson, and C. Dewey (2010) [RNA-Seq gene
|
|
84 expression estimation with read mapping
|
|
85 uncertainty](http://www.biomedcentral.com/1471-2105/12/323). Bioinformatics
|
|
86 26(4): 493-500.
|
|
87
|
|
88 ## Contact information ##
|
|
89 * RSEM galaxy wrapper questions: ruotti@wisc.edu
|
|
90 * RSEM stand alone package questions: bli@cs.wisc.edu
|
|
91 * [RSEM announcements mailing list](http://groups.google.com/group/rsem-announce)
|
|
92 * [RSEM users mailing list](http://groups.google.com/group/rsem-users)
|