Mercurial > repos > pjbriggs > amplicon_analysis_pipeline
comparison README.rst @ 0:47ec9c6f44b8 draft
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit b63924933a03255872077beb4d0fde49d77afa92
author | pjbriggs |
---|---|
date | Thu, 09 Nov 2017 10:13:29 -0500 |
parents | |
children | 1c1902e12caf |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:47ec9c6f44b8 |
---|---|
1 Amplicon_analysis-galaxy | |
2 ======================== | |
3 | |
4 A Galaxy tool wrapper to Mauro Tutino's ``Amplicon_analysis`` pipeline | |
5 script at https://github.com/MTutino/Amplicon_analysis | |
6 | |
7 The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq | |
8 (Casava >= 1.8) and performs the following operations: | |
9 | |
10 * QC and clean up of input data | |
11 * Removal of singletons and chimeras and building of OTU table | |
12 and phylogenetic tree | |
13 * Beta and alpha diversity of analysis | |
14 | |
15 Usage documentation | |
16 =================== | |
17 | |
18 Usage of the tool (including required inputs) is documented within | |
19 the ``help`` section of the tool XML. | |
20 | |
21 Installing the tool in a Galaxy instance | |
22 ======================================== | |
23 | |
24 The following sections describe how to install the tool files, | |
25 dependencies and reference data, and how to configure the Galaxy | |
26 instance to detect the dependencies and reference data correctly | |
27 at run time. | |
28 | |
29 1. Install the dependencies | |
30 --------------------------- | |
31 | |
32 The ``install_tool_deps.sh`` script can be used to fetch and install the | |
33 dependencies locally, for example:: | |
34 | |
35 install_tool_deps.sh /path/to/local_tool_dependencies | |
36 | |
37 This can take some time to complete. When finished it should have | |
38 created a set of directories containing the dependencies under the | |
39 specified top level directory. | |
40 | |
41 2. Install the tool files | |
42 ------------------------- | |
43 | |
44 The core tool is hosted on the Galaxy toolshed, so it can be installed | |
45 directly from there (this is the recommended route): | |
46 | |
47 * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/ | |
48 | |
49 Alternatively it can be installed manually; in this case there are two | |
50 files to install: | |
51 | |
52 * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition) | |
53 * ``amplicon_analysis_pipeline.py`` (the Python wrapper script) | |
54 | |
55 Put these in a directory that is visible to Galaxy (e.g. a | |
56 ``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml`` | |
57 file to tell Galaxy to offer the tool by adding the line e.g.:: | |
58 | |
59 <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" /> | |
60 | |
61 3. Install the reference data | |
62 ----------------------------- | |
63 | |
64 The script ``References.sh`` from the pipeline package at | |
65 https://github.com/MTutino/Amplicon_analysis can be run to install | |
66 the reference data, for example:: | |
67 | |
68 cd /path/to/pipeline/data | |
69 wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh | |
70 /bin/bash ./References.sh | |
71 | |
72 will install the data in ``/path/to/pipeline/data``. | |
73 | |
74 **NB** The final amount of data downloaded and uncompressed will be | |
75 around 6GB. | |
76 | |
77 4. Configure dependencies and reference data in Galaxy | |
78 ------------------------------------------------------ | |
79 | |
80 The final steps are to make your Galaxy installation aware of the | |
81 tool dependencies and reference data, so it can locate them both when | |
82 the tool is run. | |
83 | |
84 To target the tool dependencies installed previously, add the | |
85 following lines to the ``dependency_resolvers_conf.xml`` file in the | |
86 Galaxy ``config`` directory:: | |
87 | |
88 <dependency_resolvers> | |
89 ... | |
90 <galaxy_packages base_path="/path/to/local_tool_dependencies" /> | |
91 <galaxy_packages base_path="/path/to/local_tool_dependencies" versionless="true" /> | |
92 ... | |
93 </dependency_resolvers> | |
94 | |
95 (NB it is recommended to place these *before* the ``<conda ... />`` | |
96 resolvers) | |
97 | |
98 (If you're not familiar with dependency resolvers in Galaxy then | |
99 see the documentation at | |
100 https://docs.galaxyproject.org/en/master/admin/dependency_resolvers.html | |
101 for more details.) | |
102 | |
103 The tool locates the reference data via an environment variable called | |
104 ``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent | |
105 directory where the reference data has been installed. | |
106 | |
107 There are various ways to do this, depending on how your Galaxy | |
108 installation is configured: | |
109 | |
110 * **For local instances:** add a line to set it in the | |
111 ``config/local_env.sh`` file of your Galaxy installation, e.g.:: | |
112 | |
113 export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/data | |
114 | |
115 * **For production instances:** set the value in the ``job_conf.xml`` | |
116 configuration file, e.g.:: | |
117 | |
118 <destination id="amplicon_analysis"> | |
119 <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env> | |
120 </destination> | |
121 | |
122 and then specify that the pipeline tool uses this destination:: | |
123 | |
124 <tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/> | |
125 | |
126 (For more about job destinations see the Galaxy documentation at | |
127 https://galaxyproject.org/admin/config/jobs/#job-destinations) | |
128 | |
129 5. Enable rendering of HTML outputs from pipeline | |
130 ------------------------------------------------- | |
131 | |
132 To ensure that HTML outputs are displayed correctly in Galaxy | |
133 (for example the Vsearch OTU table heatmaps), Galaxy needs to be | |
134 configured not to sanitize the outputs from the ``Amplicon_analysis`` | |
135 tool. | |
136 | |
137 Either: | |
138 | |
139 * **For local instances:** set ``sanitize_all_html = False`` in | |
140 ``config/galaxy.ini`` (nb don't do this on production servers or | |
141 public instances!); or | |
142 | |
143 * **For production instances:** add the ``Amplicon_analysis`` tool | |
144 to the display whitelist in the Galaxy instance: | |
145 | |
146 - Set ``sanitize_whitelist_file = config/whitelist.txt`` in | |
147 ``config/galaxy.ini`` and restart Galaxy; | |
148 - Go to ``Admin>Manage Display Whitelist``, check the box for | |
149 ``Amplicon_analysis`` (hint: use your browser's 'find-in-page' | |
150 search function to help locate it) and click on | |
151 ``Submit new whitelist`` to update the settings. | |
152 | |
153 Additional details | |
154 ================== | |
155 | |
156 Some other things to be aware of: | |
157 | |
158 * Note that using the Silva database requires a minimum of 18Gb RAM | |
159 | |
160 Known problems | |
161 ============== | |
162 | |
163 * Only the ``VSEARCH`` pipeline in Mauro's script is currently | |
164 available via the Galaxy tool; the ``USEARCH`` and ``QIIME`` | |
165 pipelines have yet to be implemented. | |
166 * The images in the tool help section are not visible if the | |
167 tool has been installed locally, or if it has been installed in | |
168 a Galaxy instance which is served from a subdirectory. | |
169 | |
170 These are both problems with Galaxy and not the tool, see | |
171 https://github.com/galaxyproject/galaxy/issues/4490 and | |
172 https://github.com/galaxyproject/galaxy/issues/1676 | |
173 | |
174 Appendix: availability of tool dependencies | |
175 =========================================== | |
176 | |
177 The tool takes its dependencies from the underlying pipeline script (see | |
178 https://github.com/MTutino/Amplicon_analysis/blob/master/README.md | |
179 for details). | |
180 | |
181 As noted above, currently the ``install_tool_deps.sh`` script can be | |
182 used to manually install the dependencies for a local tool install. | |
183 | |
184 In principle these should also be available if the tool were installed | |
185 from a toolshed. However it would be preferrable in this case to get as | |
186 many of the dependencies as possible via the ``conda`` dependency | |
187 resolver. | |
188 | |
189 The following are known to be available via conda, with the required | |
190 version: | |
191 | |
192 - cutadapt 1.8.1 | |
193 - sickle-trim 1.33 | |
194 - bioawk 1.0 | |
195 - fastqc 0.11.3 | |
196 - R 3.2.0 | |
197 | |
198 Some dependencies are available but with the "wrong" versions: | |
199 | |
200 - spades (need 3.5.0) | |
201 - qiime (need 1.8.0) | |
202 - blast (need 2.2.26) | |
203 - vsearch (need 1.1.3) | |
204 | |
205 The following dependencies are currently unavailable: | |
206 | |
207 - fasta_number (need 02jun2015) | |
208 - fasta-splitter (need 0.2.4) | |
209 - rdp_classifier (need 2.2) | |
210 - microbiomeutil (need r20110519) | |
211 | |
212 (NB usearch 6.1.544 and 8.0.1623 are special cases which must be | |
213 handled outside of Galaxy's dependency management systems.) | |
214 | |
215 History | |
216 ======= | |
217 | |
218 ========== ====================================================================== | |
219 Version Changes | |
220 ---------- ---------------------------------------------------------------------- | |
221 1.1.0 First official version on Galaxy toolshed. | |
222 1.0.6 Expand inline documentation to provide detailed usage guidance. | |
223 1.0.5 Updates including: | |
224 | |
225 - Capture read counts from quality control as new output dataset | |
226 - Capture FastQC per-base quality boxplots for each sample as | |
227 new output dataset | |
228 - Add support for -l option (sliding window length for trimming) | |
229 - Default for -L set to "200" | |
230 1.0.4 Various updates: | |
231 | |
232 - Additional outputs are captured when a "Categories" file is | |
233 supplied (alpha diversity rarefaction curves and boxplots) | |
234 - Sample names derived from Fastqs in a collection of pairs | |
235 are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames) | |
236 - Input Fastqs can now be of more general ``fastq`` type | |
237 - Log file outputs are captured in new output dataset | |
238 - User can specify a "title" for the job which is copied into | |
239 the dataset names (to distinguish outputs from different runs) | |
240 - Improved detection and reporting of problems with input | |
241 Metatable | |
242 1.0.3 Take the sample names from the collection dataset names when | |
243 using collection as input (this is now the default input mode); | |
244 collect additional output dataset; disable ``usearch``-based | |
245 pipelines (i.e. ``UPARSE`` and ``QIIME``). | |
246 1.0.2 Enable support for FASTQs supplied via dataset collections and | |
247 fix some broken output datasets. | |
248 1.0.1 Initial version | |
249 ========== ====================================================================== |