comparison README.rst @ 0:47ec9c6f44b8 draft

planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit b63924933a03255872077beb4d0fde49d77afa92
author pjbriggs
date Thu, 09 Nov 2017 10:13:29 -0500
parents
children 1c1902e12caf
comparison
equal deleted inserted replaced
-1:000000000000 0:47ec9c6f44b8
1 Amplicon_analysis-galaxy
2 ========================
3
4 A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline
5 script at https://github.com/MTutino/Amplicon_analysis
6
7 The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq
8 (Casava >= 1.8) and performs the following operations:
9
10 * QC and clean up of input data
11 * Removal of singletons and chimeras and building of OTU table
12 and phylogenetic tree
13 * Beta and alpha diversity of analysis
14
15 Usage documentation
16 ===================
17
18 Usage of the tool (including required inputs) is documented within
19 the ``help`` section of the tool XML.
20
21 Installing the tool in a Galaxy instance
22 ========================================
23
24 The following sections describe how to install the tool files,
25 dependencies and reference data, and how to configure the Galaxy
26 instance to detect the dependencies and reference data correctly
27 at run time.
28
29 1. Install the dependencies
30 ---------------------------
31
32 The ``install_tool_deps.sh`` script can be used to fetch and install the
33 dependencies locally, for example::
34
35 install_tool_deps.sh /path/to/local_tool_dependencies
36
37 This can take some time to complete. When finished it should have
38 created a set of directories containing the dependencies under the
39 specified top level directory.
40
41 2. Install the tool files
42 -------------------------
43
44 The core tool is hosted on the Galaxy toolshed, so it can be installed
45 directly from there (this is the recommended route):
46
47 * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/
48
49 Alternatively it can be installed manually; in this case there are two
50 files to install:
51
52 * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition)
53 * ``amplicon_analysis_pipeline.py`` (the Python wrapper script)
54
55 Put these in a directory that is visible to Galaxy (e.g. a
56 ``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml``
57 file to tell Galaxy to offer the tool by adding the line e.g.::
58
59 <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" />
60
61 3. Install the reference data
62 -----------------------------
63
64 The script ``References.sh`` from the pipeline package at
65 https://github.com/MTutino/Amplicon_analysis can be run to install
66 the reference data, for example::
67
68 cd /path/to/pipeline/data
69 wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh
70 /bin/bash ./References.sh
71
72 will install the data in ``/path/to/pipeline/data``.
73
74 **NB** The final amount of data downloaded and uncompressed will be
75 around 6GB.
76
77 4. Configure dependencies and reference data in Galaxy
78 ------------------------------------------------------
79
80 The final steps are to make your Galaxy installation aware of the
81 tool dependencies and reference data, so it can locate them both when
82 the tool is run.
83
84 To target the tool dependencies installed previously, add the
85 following lines to the ``dependency_resolvers_conf.xml`` file in the
86 Galaxy ``config`` directory::
87
88 <dependency_resolvers>
89 ...
90 <galaxy_packages base_path="/path/to/local_tool_dependencies" />
91 <galaxy_packages base_path="/path/to/local_tool_dependencies" versionless="true" />
92 ...
93 </dependency_resolvers>
94
95 (NB it is recommended to place these *before* the ``<conda ... />``
96 resolvers)
97
98 (If you're not familiar with dependency resolvers in Galaxy then
99 see the documentation at
100 https://docs.galaxyproject.org/en/master/admin/dependency_resolvers.html
101 for more details.)
102
103 The tool locates the reference data via an environment variable called
104 ``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent
105 directory where the reference data has been installed.
106
107 There are various ways to do this, depending on how your Galaxy
108 installation is configured:
109
110 * **For local instances:** add a line to set it in the
111 ``config/local_env.sh`` file of your Galaxy installation, e.g.::
112
113 export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data
114
115 * **For production instances:** set the value in the ``job_conf.xml``
116 configuration file, e.g.::
117
118 <destination id="amplicon_analysis">
119 <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env>
120 </destination>
121
122 and then specify that the pipeline tool uses this destination::
123
124 <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/>
125
126 (For more about job destinations see the Galaxy documentation at
127 https://galaxyproject.org/admin/config/jobs/#job-destinations)
128
129 5. Enable rendering of HTML outputs from pipeline
130 -------------------------------------------------
131
132 To ensure that HTML outputs are displayed correctly in Galaxy
133 (for example the Vsearch OTU table heatmaps), Galaxy needs to be
134 configured not to sanitize the outputs from the ``Amplicon_analysis``
135 tool.
136
137 Either:
138
139 * **For local instances:** set ``sanitize_all_html = False`` in
140 ``config/galaxy.ini`` (nb don't do this on production servers or
141 public instances!); or
142
143 * **For production instances:** add the ``Amplicon_analysis`` tool
144 to the display whitelist in the Galaxy instance:
145
146 - Set ``sanitize_whitelist_file = config/whitelist.txt`` in
147 ``config/galaxy.ini`` and restart Galaxy;
148 - Go to ``Admin>Manage Display Whitelist``, check the box for
149 ``Amplicon_analysis`` (hint: use your browser's 'find-in-page'
150 search function to help locate it) and click on
151 ``Submit new whitelist`` to update the settings.
152
153 Additional details
154 ==================
155
156 Some other things to be aware of:
157
158 * Note that using the Silva database requires a minimum of 18Gb RAM
159
160 Known problems
161 ==============
162
163 * Only the ``VSEARCH`` pipeline in Mauro's script is currently
164 available via the Galaxy tool; the ``USEARCH`` and ``QIIME``
165 pipelines have yet to be implemented.
166 * The images in the tool help section are not visible if the
167 tool has been installed locally, or if it has been installed in
168 a Galaxy instance which is served from a subdirectory.
169
170 These are both problems with Galaxy and not the tool, see
171 https://github.com/galaxyproject/galaxy/issues/4490 and
172 https://github.com/galaxyproject/galaxy/issues/1676
173
174 Appendix: availability of tool dependencies
175 ===========================================
176
177 The tool takes its dependencies from the underlying pipeline script (see
178 https://github.com/MTutino/Amplicon_analysis/blob/master/README.md
179 for details).
180
181 As noted above, currently the ``install_tool_deps.sh`` script can be
182 used to manually install the dependencies for a local tool install.
183
184 In principle these should also be available if the tool were installed
185 from a toolshed. However it would be preferrable in this case to get as
186 many of the dependencies as possible via the ``conda`` dependency
187 resolver.
188
189 The following are known to be available via conda, with the required
190 version:
191
192 - cutadapt 1.8.1
193 - sickle-trim 1.33
194 - bioawk 1.0
195 - fastqc 0.11.3
196 - R 3.2.0
197
198 Some dependencies are available but with the "wrong" versions:
199
200 - spades (need 3.5.0)
201 - qiime (need 1.8.0)
202 - blast (need 2.2.26)
203 - vsearch (need 1.1.3)
204
205 The following dependencies are currently unavailable:
206
207 - fasta_number (need 02jun2015)
208 - fasta-splitter (need 0.2.4)
209 - rdp_classifier (need 2.2)
210 - microbiomeutil (need r20110519)
211
212 (NB usearch 6.1.544 and 8.0.1623 are special cases which must be
213 handled outside of Galaxy's dependency management systems.)
214
215 History
216 =======
217
218 ========== ======================================================================
219 Version Changes
220 ---------- ----------------------------------------------------------------------
221 1.1.0 First official version on Galaxy toolshed.
222 1.0.6 Expand inline documentation to provide detailed usage guidance.
223 1.0.5 Updates including:
224
225 - Capture read counts from quality control as new output dataset
226 - Capture FastQC per-base quality boxplots for each sample as
227 new output dataset
228 - Add support for -l option (sliding window length for trimming)
229 - Default for -L set to "200"
230 1.0.4 Various updates:
231
232 - Additional outputs are captured when a "Categories" file is
233 supplied (alpha diversity rarefaction curves and boxplots)
234 - Sample names derived from Fastqs in a collection of pairs
235 are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames)
236 - Input Fastqs can now be of more general ``fastq`` type
237 - Log file outputs are captured in new output dataset
238 - User can specify a "title" for the job which is copied into
239 the dataset names (to distinguish outputs from different runs)
240 - Improved detection and reporting of problems with input
241 Metatable
242 1.0.3 Take the sample names from the collection dataset names when
243 using collection as input (this is now the default input mode);
244 collect additional output dataset; disable ``usearch``-based
245 pipelines (i.e. ``UPARSE`` and ``QIIME``).
246 1.0.2 Enable support for FASTQs supplied via dataset collections and
247 fix some broken output datasets.
248 1.0.1 Initial version
249 ========== ======================================================================