annotate rsem/README @ 0:4edac0183857

Initial commit from tarball version 1.17
author victor
date Mon, 05 Mar 2012 11:12:34 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
1 # RSEM Galaxy Wrapper #
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
2
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
3 ## Introduction ##
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
4
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
5 RSEM (RNA-Seq by Expectation-Maximization) is a software package for the
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
6 estimation of gene and isoform abundances from RNA-Seq data. A key feature of
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
7 RSEM is its statistically-principled approach to the handling of RNA-Seq
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
8 reads that map to multiple genes and/or isoforms. In addition, RSEM is
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
9 well-suited to performing quantification with de novo transcriptome
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
10 assemblies, as it does not require a reference genome.
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
11
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
12 ## Installation ##
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
13
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
14 Follow the [Galaxy Tool Shed
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
15 instructions](http://wiki.g2.bx.psu.edu/Tool_Shed) to add this wrapper from
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
16 the tool shed to your galaxy instance. Once the files are in the tools
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
17 directory you have to have RSEM references installed. This can be done by:
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
18
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
19 1. Placing the file called `rsem_indices.loc` into the directory
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
20 `~/galaxy-dist/tool-data` This file tells the RSEM wrapper how to find the
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
21 reference(s). It is formatted according to galaxy's documentation with the
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
22 following tab-delimited format:
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
23
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
24 unique_build_id dbkey display_name file_base_path
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
25
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
26 For example,
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
27
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
28 human_refseq_NM human_refseq_NM human_refseq_NM /opt/galaxy/references/human/1.1.2/NM_refseq_ref
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
29
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
30 2. Downloaded a pre-built RSEM reference from the [RSEM website](http://deweylab.biostat.wisc.edu/rsem/).
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
31
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
32 3. Place reference files into the `file_base_path` listed in the
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
33 `rsem_indices.loc` file
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
34
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
35 If you would rather build your own reference files follow the instructions
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
36 below and then place resulting reference files into the `file_base_path` listed
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
37 in the `rsem_indices.loc` file.
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
38
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
39 ### Building a custom RSEM reference ###
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
40
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
41 For instructions on how to build the RSEM reference files, first see the [RSEM
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
42 documentation](http://deweylab.biostat.wisc.edu/rsem/README.html).
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
43
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
44 #### Example ####
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
45
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
46 Suppose we have mouse RNA-Seq data and want to use the UCSC mm9 version of the
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
47 mouse genome. We have downloaded the UCSC Genes transcript annotations in GTF
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
48 format (as mm9.gtf) using the Table Browser and the knownIsoforms.txt file for
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
49 mm9 from the UCSC Downloads. We also have all chromosome files for mm9 in the
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
50 directory `/data/mm9`. We want to put the generated reference files under
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
51 `/opt/galaxy/references` with name `mouse_125`. We'll add poly(A) tails with
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
52 length 125. Please note that GTF files generated from UCSC's Table Browser do
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
53 not contain isoform-gene relationship information. For the UCSC Genes
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
54 annotation, this information can be obtained from the knownIsoforms.txt file.
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
55 Suppose we want to build Bowtie indices and Bowtie executables are found in
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
56 `/sw/bowtie`.
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
57
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
58 To build the reference files, first run the command:
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
59
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
60 rsem-prepare-reference --gtf mm9.gtf \
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
61 --transcript-to-gene-map knownIsoforms.txt \
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
62 --bowtie-path /sw/bowtie \
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
63 /data/mm9/chr1.fa,/data/mm9/chr2.fa,...,/data/mm9/chrM.fa \
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
64 /opt/galaxy/references/mouse_125
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
65
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
66 To add this reference to your galaxy installation, add the following line to
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
67 the the `rsem_indices.loc` file:
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
68
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
69 mouse_125 mouse_125 mouse_125 /opt/galaxy/references/mouse_125
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
70
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
71 Then restart galaxy and you should see the `mouse_125` reference listed in the
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
72 RSEM wrapper.
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
73
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
74 ## References ##
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
75
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
76 * [RSEM website (stand alone package)](http://deweylab.biostat.wisc.edu/rsem/)
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
77
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
78 * B. Li and C. Dewey (2011) [RSEM: accurate transcript quantification from
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
79 RNA-Seq data with or without a reference
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
80 genome](http://bioinformatics.oxfordjournals.org/content/26/4/493.abstract).
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
81 BMC Bioinformatics 12:323.
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
82
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
83 * B. Li, V. Ruotti, R. Stewart, J. Thomson, and C. Dewey (2010) [RNA-Seq gene
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
84 expression estimation with read mapping
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
85 uncertainty](http://www.biomedcentral.com/1471-2105/12/323). Bioinformatics
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
86 26(4): 493-500.
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
87
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
88 ## Contact information ##
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
89 * RSEM galaxy wrapper questions: ruotti@wisc.edu
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
90 * RSEM stand alone package questions: bli@cs.wisc.edu
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
91 * [RSEM announcements mailing list](http://groups.google.com/group/rsem-announce)
4edac0183857 Initial commit from tarball version 1.17
victor
parents:
diff changeset
92 * [RSEM users mailing list](http://groups.google.com/group/rsem-users)