comparison README.txt @ 0:c34063ab3735 draft

Initial commit of code in iuc github repository
author fubar
date Thu, 01 Jan 2015 21:58:00 -0500
parents
children dd6cf2ddaac7
comparison
equal deleted inserted replaced
-1:000000000000 0:c34063ab3735
1 # WARNING before you start
2 # Install this tool on a private Galaxy ONLY
3 # Please NEVER on a public or production instance
4 # updated august 2014 by John Chilton adding citation support
5 #
6 # updated august 8 2014 to fix bugs reported by Marius van den Beek
7 # please cite the resource at http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
8 # if you use this tool in your published work.
9
10 *Short Story*
11
12 This is an unusual Galaxy tool capable of generating new Galaxy tools.
13 It works by exposing *unrestricted* and therefore extremely dangerous
14 scripting to all designated administrators of the host Galaxy server, allowing them to run scripts
15 in R, python, sh and perl over multiple selected input data sets, writing a single new data set as output.
16
17 *Automated outputs in named sections*
18
19 If your script writes to the current directory path, arbitrary mix of (eg) pdfs, tabular analysis results and run logs,
20 the tool factory can optionally auto-generate a linked Html page with separate sections showing a thumbnail grid
21 for all pdfs and the log text, grouping all artifacts sharing a file name and log name prefix::
22
23 eg: if "foo.log" is emitted then *all* other outputs matching foo_* will all be grouped together - eg
24 foo_baz.pdf
25 foo_bar.pdf and
26 foo_zot.xls
27 would all be displayed and linked in the same section with foo.log's contents - to form the "Foo" section of the Html page.
28 Sections appear in alphabetic order and there are no limits on the number of files or sections.
29
30 *Automated generation of new Galaxy tool shed tools for installation into any Galaxy*
31
32 Once a script is working correctly, this tool optionally generates a new Galaxy tool, effectively
33 freezing the supplied script into a new, ordinary Galaxy tool that runs it over one or more input files
34 selected by the user. Generated tools are installed via a tool shed by an administrator and work exactly like all other Galaxy tools for your users.
35
36 If you use the Html output option, please ensure that sanitize_all_html is set to False and
37 uncommented in universe_wsgi.ini - it should show::
38
39 # By default, all tool output served as 'text/html' will be sanitized
40 sanitize_all_html = False
41
42 This opens potential security risks and may not be acceptable for public sites where the lack of stylesheets
43 may make Html pages damage onlookers' eyeballs but should still be correct.
44
45
46 *More Detail*
47
48 To use the ToolFactory, you should have prepared a script to paste into a text box,
49 and a small test input example ready to select from your history to test your new script.
50 There is an example in each scripting language on the Tool Factory form. You can just
51 cut and paste these to try it out - remember to select the right interpreter please. You'll
52 also need to create a small test data set using the Galaxy history add new data tool.
53
54 If the script fails somehow, use the "redo" button on the tool output in your history to
55 recreate the form complete with broken script. Fix the bug and execute again. Rinse, wash, repeat.
56
57 Once the script runs sucessfully, a new Galaxy tool that runs your script can be generated.
58 Select the "generate" option and supply some help text and names. The new tool will be
59 generated in the form of a new Galaxy datatype - toolshed.gz - as the name suggests,
60 it's an archive ready to upload to a Galaxy ToolShed as a new tool repository.
61
62 Once it's in a ToolShed, it can be installed into any local Galaxy server from
63 the server administrative interface.
64
65 Once the new tool is installed, local users can run it - each time, the script that was supplied
66 when it was built will be executed with the input chosen from the user's history. In other words,
67 the tools you generate with the ToolFactory run just like any other Galaxy tool,
68 but run your script every time.
69
70 Tool factory tools are perfect for workflow components. One input, one output, no variables.
71
72 *To fully and safely exploit the awesome power* of this tool, Galaxy and the ToolShed,
73 you should be a developer installing this tool on a private/personal/scratch local instance where you
74 are an admin_user. Then, if you break it, you get to keep all the pieces
75 see https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
76
77 ** Installation **
78 This is a Galaxy tool. You can install it most conveniently using the administrative "Search and browse tool sheds" link.
79 Find the Galaxy Main toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory repository.
80 Open it and review the code and select the option to install it.
81
82 (
83 If you can't get the tool that way, the xml and py files here need to be copied into a new tools
84 subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry pointing to the xml
85 file - something like::
86
87 <section name="Tool building tools" id="toolbuilders">
88 <tool file="toolfactory/rgToolFactory.xml"/>
89 </section>
90
91 If not already there (I just added it to datatypes_conf.xml.sample), please add:
92 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" mimetype="multipart/x-gzip" subclass="True" />
93 to your local data_types_conf.xml.
94 )
95
96 Of course, R, python, perl etc are needed on your path if you want to test scripts using those interpreters.
97 Adding new ones to this tool code should be easy enough. Please make suggestions as bitbucket issues and code.
98 The HTML file code automatically shrinks R's bloated pdfs, and depends on ghostscript. The thumbnails require imagemagick .
99
100 * Restricted execution *
101 The tool factory tool itself will then be usable ONLY by admin users - people with IDs in admin_users in universe_wsgi.ini
102 **Yes, that's right. ONLY admin_users can run this tool** Think about it for a moment. If allowed to run any
103 arbitrary script on your Galaxy server, the only thing that would impede a miscreant bent on destroying all your
104 Galaxy data would probably be lack of appropriate technical skills.
105
106 *What it does* This is a tool factory for simple scripts in python, R and perl currently.
107 Functional tests are automatically generated. How cool is that.
108
109 LIMITED to simple scripts that read one input from the history.
110 Optionally can write one new history dataset,
111 and optionally collect any number of outputs into links on an autogenerated HTML
112 index page for the user to navigate - useful if the script writes images and output files - pdf outputs
113 are shown as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and imagemagik need to
114 be avaailable.
115
116 Generated tools can be edited and enhanced like any Galaxy tool, so start small and build up since
117 a generated script gets you a serious leg up to a more complex one.
118
119 *What you do* You paste and run your script
120 you fix the syntax errors and eventually it runs
121 You can use the redo button and edit the script before
122 trying to rerun it as you debug - it works pretty well.
123
124 Once the script works on some test data, you can
125 generate a toolshed compatible gzip file
126 containing your script ready to run as an ordinary Galaxy tool in a
127 repository on your local toolshed. That means safe and largely automated installation in any
128 production Galaxy configured to use your toolshed.
129
130 *Generated tool Security* Once you install a generated tool, it's just
131 another tool - assuming the script is safe. They just run normally and their user cannot do anything unusually insecure
132 but please, practice safe toolshed.
133 Read the fucking code before you install any tool.
134 Especially this one - it is really scary.
135
136 If you opt for an HTML output, you get all the script outputs arranged
137 as a single Html history item - all output files are linked, thumbnails for all the pdfs.
138 Ugly but really inexpensive.
139
140 Patches and suggestions welcome as bitbucket issues please?
141
142 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012
143
144 all rights reserved
145 Licensed under the LGPL if you want to improve it, feel free https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
146
147 Material for our more enthusiastic and voracious readers continues below - we salute you.
148
149 **Motivation** Simple transformation, filtering or reporting scripts get written, run and lost every day in most busy labs
150 - even ours where Galaxy is in use. This 'dark script matter' is pervasive and generally not reproducible.
151
152 **Benefits** For our group, this allows Galaxy to fill that important dark script gap - all those "small" bioinformatics
153 tasks. Once a user has a working R (or python or perl) script that does something Galaxy cannot currently do (eg transpose a
154 tabular file) and takes parameters the way Galaxy supplies them (see example below), they:
155
156 1. Install the tool factory on a personal private instance
157
158 2. Upload a small test data set
159
160 3. Paste the script into the 'script' text box and iteratively run the insecure tool on test data until it works right -
161 there is absolutely no reason to do this anywhere other than on a personal private instance.
162
163 4. Once it works right, set the 'Generate toolshed gzip' option and run it again.
164
165 5. A toolshed style gzip appears ready to upload and install like any other Toolshed entry.
166
167 6. Upload the new tool to the toolshed
168
169 7. Ask the local admin to check the new tool to confirm it's not evil and install it in the local production galaxy
170
171 **Simple examples on the tool form**
172
173 A simple Rscript "filter" showing how the command line parameters can be handled, takes an input file,
174 does something (transpose in this case) and writes the results to a new tabular file::
175
176 # transpose a tabular input file and write as a tabular output file
177 ourargs = commandArgs(TRUE)
178 inf = ourargs[1]
179 outf = ourargs[2]
180 inp = read.table(inf,head=F,row.names=NULL,sep='\t')
181 outp = t(inp)
182 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=F)
183
184 Calculate a multiple test adjusted p value from a column of p values - for this script to be useful,
185 it needs the right column for the input to be specified in the code for the
186 given input file type(s) specified when the tool is generated ::
187
188 # use p.adjust - assumes a HEADER row and column 1 - please fix for any real use
189 column = 1 # adjust if necessary for some other kind of input
190 fdrmeth = 'BH'
191 ourargs = commandArgs(TRUE)
192 inf = ourargs[1]
193 outf = ourargs[2]
194 inp = read.table(inf,head=T,row.names=NULL,sep='\t')
195 p = inp[,column]
196 q = p.adjust(p,method=fdrmeth)
197 newval = paste(fdrmeth,'p-value',sep='_')
198 q = data.frame(q)
199 names(q) = newval
200 outp = cbind(inp,newval=q)
201 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=T)
202
203
204
205 Another Rscript example without any input file - generates a random heatmap pdf - you must make sure the option to create an HTML output file is
206 turned on for this to work. The heatmap will be presented as a thumbnail linked to the pdf in the resulting HTML page::
207
208 # note this script takes NO input or output because it generates random data
209 foo = data.frame(a=runif(100),b=runif(100),c=runif(100),d=runif(100),e=runif(100),f=runif(100))
210 bar = as.matrix(foo)
211 pdf( "heattest.pdf" )
212 heatmap(bar,main='Random Heatmap')
213 dev.off()
214
215 A Python example that reverses each row of a tabular file. You'll need to remove the leading spaces for this to work if cut
216 and pasted into the script box. Note that you can already do this in Galaxy by setting up the cut columns tool with the
217 correct number of columns in reverse order,but this script will work for any number of columns so is completely generic::
218
219 # reverse order of columns in a tabular file
220 import sys
221 inp = sys.argv[1]
222 outp = sys.argv[2]
223 i = open(inp,'r')
224 o = open(outp,'w')
225 for row in i:
226 rs = row.rstrip().split('\t')
227 rs.reverse()
228 o.write('\t'.join(rs))
229 o.write('\n')
230 i.close()
231 o.close()
232
233
234 Galaxy as an IDE for developing API scripts
235 If you need to develop Galaxy API scripts and you like to live dangerously, please read on.
236
237 Galaxy as an IDE?
238 Amazingly enough, blend-lib API scripts run perfectly well *inside* Galaxy when pasted into a Tool Factory form. No need to generate a new tool. Galaxy+Tool_Factory = IDE I think we need a new t-shirt. Seriously, it is actually quite useable.
239
240 Why bother - what's wrong with Eclipse
241 Nothing. But, compared with developing API scripts in the usual way outside Galaxy, you get persistence and other framework benefits plus at absolutely no extra charge, a ginormous security problem if you share the history or any outputs because they contain the api script with key so development servers only please!
242
243 Workflow
244 Fire up the Tool Factory in Galaxy.
245
246 Leave the input box empty, set the interpreter to python, paste and run an api script - eg working example (substitute the url and key) below.
247
248 It took me a few iterations to develop the example below because I know almost nothing about the API. I started with very simple code from one of the samples and after each run, the (edited..) api script is conveniently recreated using the redo button on the history output item. So each successive version of the developing api script you run is persisted - ready to be edited and rerun easily. It is ''very'' handy to be able to add a line of code to the script and run it, then view the output to (eg) inspect dicts returned by API calls to help move progressively deeper iteratively.
249
250 Give the below a whirl on a private clone (install the tool factory from the main toolshed) and try adding complexity with few rerun/edit/rerun cycles.
251
252 Eg tool factory api script
253 import sys
254 from blend.galaxy import GalaxyInstance
255 ourGal = 'http://x.x.x.x:xxxx'
256 ourKey = 'xxx'
257 gi = GalaxyInstance(ourGal, key=ourKey)
258 libs = gi.libraries.get_libraries()
259 res = []
260 # libs looks like
261 # u'url': u'/galaxy/api/libraries/441d8112651dc2f3', u'id': u'441d8112651dc2f3', u'name':.... u'Demonstration sample RNA data',
262 for lib in libs:
263 res.append('%s:\n' % lib['name'])
264 res.append(str(gi.libraries.show_library(lib['id'],contents=True)))
265 outf=open(sys.argv[2],'w')
266 outf.write('\n'.join(res))
267 outf.close()
268
269 **Attribution**
270 Creating re-usable tools from scripts: The Galaxy Tool Factory
271 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
272 Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573
273
274 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
275
276 **Licensing**
277 Copyright Ross Lazarus 2010
278 ross lazarus at g mail period com
279
280 All rights reserved.
281
282 Licensed under the LGPL
283
284 **Obligatory screenshot**
285
286 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png
287