comparison toolfactory/README.md @ 30:6f48315c32c1 draft

Uploaded
author fubar
date Fri, 07 Aug 2020 07:54:23 -0400
parents
children 4d578c8c1613
comparison
equal deleted inserted replaced
29:6db39cbc3242 30:6f48315c32c1
1 toolfactory_2
2 =============
3
4 This is an upgrade to the tool factory but with added parameters
5 (optionally editable in the generated tool form - otherwise fixed) and
6 multiple input files.
7
8 Specify any number of parameters - well at
9 least up to the limit of your patience with repeat groups.
10
11 Parameter values supplied at tool generation time are defaults and
12 can be optionally editable by the user - names cannot be changed once
13 a tool has been generated.
14
15 If not editable, they act as hidden parameters passed to the script
16 and are not editable on the tool form.
17
18 Note! There will be Galaxy default sanitization for all
19 user input parameters which your script may need to dance around.
20
21 Any number of input files can be passed to your script, but of course it
22 has to deal with them. Both path and metadata name are supplied either in the environment
23 (bash/sh) or as command line parameters (python,perl,rscript) that need to be parsed and
24 dealt with in the script. This is complicated by the common use case of needing file names
25 for (eg) column headers, as well as paths. Try the examples are show on the tool factory
26 form to see how Galaxy file and user supplied parameter values can be recovered in each
27 of the 4 scripting environments supported.
28
29 Best way to deal with multiple outputs is to let the tool factory generate an HTML
30 page for your users. It automagically lays out pdf images as thumbnail galleries
31 and can have separate results sections gathering all similarly prefixed files, such as
32 a Foo section taking text and results from text (foo_whatever.log) and
33 artifacts (eg foo_MDS_plot.pdf) file names. All artifacts are linked for download.
34 A copy of the actual script is provided for provenance - be warned, it exposes
35 real file paths.
36
37 **WARNING before you start**
38
39 Install this tool on a private Galaxy ONLY
40 Please NEVER on a public or production instance
41 Please cite the resource at
42 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
43 if you use this tool in your published work.
44
45
46 *Short Story*
47
48 This is an unusual Galaxy tool capable of generating new Galaxy tools.
49 It works by exposing *unrestricted* and therefore extremely dangerous scripting
50 to all designated administrators of the host Galaxy server, allowing them to
51 run scripts in R, python, sh and perl over multiple selected input data sets,
52 writing a single new data set as output.
53
54 *Differences between TF2 and the original Tool Factory*
55
56 1. TF2 (this one) allows any number of either fixed or user-editable parameters to be defined
57 for the new tool. If these are editable, the user can change them but otherwise, they are passed
58 as fixed and invisible parameters for each execution. Obviously, there are substantial security
59 implications with editable parameters, but these are always sanitized by Galaxy's inbuilt
60 parameter sanitization so you may need to "unsanitize" characters - eg translate all "__lt__"
61 into "<" for certain parameters where that is needed. Please practise safe toolshed.
62
63 2. Any number of (the same datatype) of input files may be defined.
64
65 These changes substantially complicate the way your supplied script is supplied with
66 all the new and variable parameters. Examples in each scripting language are shown
67 in the tool help
68
69 *Automated outputs in named sections*
70
71 If your script writes to the current directory path, arbitrary mix of (eg)
72 pdfs, tabular analysis results and run logs,the tool factory can optionally
73 auto-generate a linked Html page with separate sections showing a thumbnail
74 grid for all pdfs and the log text, grouping all artifacts sharing a file
75 name and log name prefix.if "foo.log" is emitted then *all* other outputs matching foo_* will
76 all be grouped together - eg
77 - foo_baz.pdf
78 - foo_bar.pdf and
79 - foo_zot.xls
80
81 would all be displayed and linked in the same section with foo.log's contents to form the "Foo" section of the Html page.
82 Sections appear in alphabetic order and there are no limits on the number of files or sections.
83
84 *Automated generation of new Galaxy tools for installation into any Galaxy*
85
86 Once a script is working correctly, this tool optionally generates a
87 new Galaxy tool, effectively freezing the supplied script into a new,
88 ordinary Galaxy tool that runs it over one or more input files selected by
89 the user. Generated tools are installed via a tool shed by an administrator
90 and work exactly like all other Galaxy tools for your users.
91
92 If you use the Html output option, please ensure that sanitize_all_html is
93 set to False and uncommented in universe_wsgi.ini - it should show
94
95 By default, all tool output served as 'text/html' will be sanitized
96 Change ```sanitize_all_html = False```
97
98 This opens potential security risks and may not be acceptable for public
99 sites where the lack of stylesheets may make Html pages damage onlookers'
100 eyeballs but should still be correct.
101
102 *More Detail*
103
104 To use the ToolFactory, you should have prepared a script to paste into a
105 text box, and a small test input example ready to select from your history
106 to test your new script.
107
108 There is an example in each scripting language on the Tool Factory form. You
109 can just cut and paste these to try it out - remember to select the right
110 interpreter please. You'll also need to create a small test data set using
111 the Galaxy history add new data tool.
112
113 If the script fails somehow, use the "redo" button on the tool output in
114 your history to recreate the form complete with broken script. Fix the bug
115 and execute again. Rinse, wash, repeat.
116
117 Once the script runs sucessfully, a new Galaxy tool that runs your script
118 can be generated. Select the "generate" option and supply some help text and
119 names. The new tool will be generated in the form of a new Galaxy datatype
120 - toolshed.gz - as the name suggests, it's an archive ready to upload to a
121 Galaxy ToolShed as a new tool repository.
122
123 Once it's in a ToolShed, it can be installed into any local Galaxy server
124 from the server administrative interface.
125
126 Once the new tool is installed, local users can run it - each time, the script
127 that was supplied when it was built will be executed with the input chosen
128 from the user's history. In other words, the tools you generate with the
129 ToolFactory run just like any other Galaxy tool,but run your script every time.
130
131 Tool factory tools are perfect for workflow components. One input, one output,
132 no variables.
133
134 *To fully and safely exploit the awesome power* of this tool,
135 Galaxy and the ToolShed, you should be a developer installing this
136 tool on a private/personal/scratch local instance where you are an
137 admin_user. Then, if you break it, you get to keep all the pieces see
138 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
139
140 ** Installation **
141 This is a Galaxy tool. You can install it most conveniently using the
142 administrative "Search and browse tool sheds" link. Find the Galaxy Main
143 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
144 repository. Open it and review the code and select the option to install it.
145
146
147 If you can't get the tool that way, the xml and py files here need to be
148 copied into a new tools subdirectory such as tools/toolfactory
149 Your tool_conf.xml needs a new entry pointing to the xml \file - something like
150 ```
151 <section name="Tool building tools" id="toolbuilders">
152 <tool file="toolfactory/rgToolFactory.xml"/>
153 </section>
154 ```
155 If not already there (I just added it to datatypes_conf.xml.sample),
156 please add:
157
158 ```
159 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary"
160 mimetype="multipart/x-gzip" subclass="True" />
161 ```
162 to your local data_types_conf.xml.
163
164
165 Of course, R, python, perl etc are needed on your path if you want to test
166 scripts using those interpreters. Adding new ones to this tool code should
167 be easy enough. Please make suggestions as bitbucket issues and code. The
168 HTML file code automatically shrinks R's bloated pdfs, and depends on
169 ghostscript. The thumbnails require imagemagick .
170
171 * Restricted execution *
172 The tool factory tool itself will then be usable ONLY by admin users -
173 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY
174 admin_users can run this tool** Think about it for a moment. If allowed to
175 run any arbitrary script on your Galaxy server, the only thing that would
176 impede a miscreant bent on destroying all your Galaxy data would probably
177 be lack of appropriate technical skills.
178
179 *What it does* This is a tool factory for simple scripts in python, R and
180 perl currently. Functional tests are automatically generated. How cool is that.
181
182 LIMITED to simple scripts that read one input from the history. Optionally can
183 write one new history dataset, and optionally collect any number of outputs
184 into links on an autogenerated HTML index page for the user to navigate -
185 useful if the script writes images and output files - pdf outputs are shown
186 as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and
187 imagemagik need to be available.
188
189 Generated tools can be edited and enhanced like any Galaxy tool, so start
190 small and build up since a generated script gets you a serious leg up to a
191 more complex one.
192
193 *What you do* You paste and run your script, you fix the syntax errors and
194 eventually it runs. You can use the redo button and edit the script before
195 trying to rerun it as you debug - it works pretty well.
196
197 Once the script works on some test data, you can generate a toolshed compatible
198 gzip file containing your script ready to run as an ordinary Galaxy tool in
199 a repository on your local toolshed. That means safe and largely automated
200 installation in any production Galaxy configured to use your toolshed.
201
202 *Generated tool Security* Once you install a generated tool, it's just
203 another tool - assuming the script is safe. They just run normally and their
204 user cannot do anything unusually insecure but please, practice safe toolshed.
205 Read the fucking code before you install any tool. Especially this one -
206 it is really scary.
207
208 If you opt for an HTML output, you get all the script outputs arranged
209 as a single Html history item - all output files are linked, thumbnails for
210 all the pdfs. Ugly but really inexpensive.
211
212 Patches and suggestions welcome as bitbucket issues please?
213
214 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012
215
216 all rights reserved
217 Licensed under the LGPL if you want to improve it, feel free
218 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
219
220 Material for our more enthusiastic and voracious readers continues below -
221 we salute you.
222
223 **Motivation** Simple transformation, filtering or reporting scripts get
224 written, run and lost every day in most busy labs - even ours where Galaxy is
225 in use. This 'dark script matter' is pervasive and generally not reproducible.
226
227 **Benefits** For our group, this allows Galaxy to fill that important dark
228 script gap - all those "small" bioinformatics tasks. Once a user has a working
229 R (or python or perl) script that does something Galaxy cannot currently do
230 (eg transpose a tabular file) and takes parameters the way Galaxy supplies
231 them (see example below), they:
232
233 1. Install the tool factory on a personal private instance
234
235 2. Upload a small test data set
236
237 3. Paste the script into the 'script' text box and iteratively run the
238 insecure tool on test data until it works right - there is absolutely no
239 reason to do this anywhere other than on a personal private instance.
240
241 4. Once it works right, set the 'Generate toolshed gzip' option and run
242 it again.
243
244 5. A toolshed style gzip appears ready to upload and install like any other
245 Toolshed entry.
246
247 6. Upload the new tool to the toolshed
248
249 7. Ask the local admin to check the new tool to confirm it's not evil and
250 install it in the local production galaxy
251
252
253
254 **Parameter passing and file inputs**
255
256 Your script will receive up to 3 named parameters
257 INPATHS is a comma separated list of input file paths
258 INNAMES is a comma separated list of input file names in the same order
259 OUTPATH is optional if a file is being generated, your script should write there
260 Your script should open and write files in the provided working directory if you are using the Html
261 automatic presentation option.
262
263 Python script command lines will have --INPATHS and --additional_arguments etc. to make it easy to use argparse
264
265 Rscript will need to use commandArgs(TRUE) - see the example below - additional arguments will
266 appear as themselves - eg foo="bar" will mean that foo is defined as "bar" for the script.
267
268 Bash and sh will see any additional parameters on their command lines and the 3 named parameters
269 in their environment magically - well, using env on the CL
270 ```
271 ***python***::
272
273 # argparse for 3 possible comma separated lists
274 # additional parameters need to be parsed !
275 # then echo parameters to the output file
276 import sys
277 import argparse
278 argp=argparse.ArgumentParser()
279 argp.add_argument('--INNAMES',default=None)
280 argp.add_argument('--INPATHS',default=None)
281 argp.add_argument('--OUTPATH',default=None)
282 argp.add_argument('--additional_parameters',default=[],action="append")
283 argp.add_argument('otherargs', nargs=argparse.REMAINDER)
284 args = argp.parse_args()
285 f= open(args.OUTPATH,'w')
286 s = '### args=%s\n' % str(args)
287 f.write(s)
288 s = 'sys.argv=%s\n' % sys.argv
289 f.write(s)
290 f.close()
291
292
293
294 ***Rscript***::
295
296 # tool factory Rscript parser suggested by Forester
297 # http://www.r-bloggers.com/including-arguments-in-r-cmd-batch-mode/
298 # additional parameters will appear in the ls() below - they are available
299 # to your script
300 # echo parameters to the output file
301 ourargs = commandArgs(TRUE)
302 if(length(ourargs)==0){
303 print("No arguments supplied.")
304 }else{
305 for(i in 1:length(ourargs)){
306 eval(parse(text=ourargs[[i]]))
307 }
308 sink(OUTPATH)
309 cat('INPATHS=',INPATHS,'\n')
310 cat('INNAMES=',INNAMES,'\n')
311 cat('OUTPATH=',OUTPATH,'\n')
312 x=ls()
313 cat('all objects=',x,'\n')
314 sink()
315 }
316 sessionInfo()
317 print.noquote(date())
318
319
320 ***bash/sh***::
321
322 # tool factory sets up these environmental variables
323 # this example writes those to the output file
324 # additional params appear on command line
325 if [ ! -f "$OUTPATH" ] ; then
326 touch "$OUTPATH"
327 fi
328 echo "INPATHS=$INPATHS" >> "$OUTPATH"
329 echo "INNAMES=$INNAMES" >> "$OUTPATH"
330 echo "OUTPATH=$OUTPATH" >> "$OUTPATH"
331 echo "CL=$@" >> "$OUTPATH"
332
333 ***perl***::
334
335 (my $INPATHS,my $INNAMES,my $OUTPATH ) = @ARGV;
336 open(my $fh, '>', $OUTPATH) or die "Could not open file '$OUTPATH' $!";
337 print $fh "INPATHS=$INPATHS\n INNAMES=$INNAMES\n OUTPATH=$OUTPATH\n";
338 close $fh;
339
340 ```
341
342 Galaxy as an IDE for developing API scripts
343 If you need to develop Galaxy API scripts and you like to live dangerously,
344 please read on.
345
346 Galaxy as an IDE?
347 Amazingly enough, blend-lib API scripts run perfectly well *inside*
348 Galaxy when pasted into a Tool Factory form. No need to generate a new
349 tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously,
350 it is actually quite useable.
351
352 Why bother - what's wrong with Eclipse
353 Nothing. But, compared with developing API scripts in the usual way outside
354 Galaxy, you get persistence and other framework benefits plus at absolutely
355 no extra charge, a ginormous security problem if you share the history or
356 any outputs because they contain the api script with key so development
357 servers only please!
358
359 Workflow
360 Fire up the Tool Factory in Galaxy.
361
362 Leave the input box empty, set the interpreter to python, paste and run an
363 api script - eg working example (substitute the url and key) below.
364
365 It took me a few iterations to develop the example below because I know
366 almost nothing about the API. I started with very simple code from one of the
367 samples and after each run, the (edited..) api script is conveniently recreated
368 using the redo button on the history output item. So each successive version
369 of the developing api script you run is persisted - ready to be edited and
370 rerun easily. It is ''very'' handy to be able to add a line of code to the
371 script and run it, then view the output to (eg) inspect dicts returned by
372 API calls to help move progressively deeper iteratively.
373
374 Give the below a whirl on a private clone (install the tool factory from
375 the main toolshed) and try adding complexity with few rerun/edit/rerun cycles.
376
377 Eg tool factory api script
378 ```
379 import sys
380 from blend.galaxy import GalaxyInstance
381 ourGal = 'http://x.x.x.x:xxxx'
382 ourKey = 'xxx'
383 gi = GalaxyInstance(ourGal, key=ourKey)
384 libs = gi.libraries.get_libraries()
385 res = []
386 # libs looks like
387 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id':
388 u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data',
389 for lib in libs:
390 res.append('%s:\n' % lib['name'])
391 res.append(str(gi.libraries.show_library(lib['id'],contents=True)))
392 outf=open(sys.argv[2],'w')
393 outf.write('\n'.join(res))
394 outf.close()
395 ```
396
397 **Attribution**
398 Creating re-usable tools from scripts: The Galaxy Tool Factory
399 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
400 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
401
402 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
403
404 **Licensing**
405 Copyright Ross Lazarus 2010
406 ross lazarus at g mail period com
407
408 All rights reserved.
409
410 Licensed under the LGPL
411
412 **screenshot**
413
414 ![example run](/images/dynamicScriptTool.png)
415
416
417 ```
418