comparison fubar-galaxytoolfactory-ca7db160878a/README.txt @ 3:8c578211a681 draft

Fixed nasty silly bug - fixed locally but not previously propogated
author fubar
date Fri, 31 Aug 2012 23:04:13 -0400
parents
children
comparison
equal deleted inserted replaced
2:b55b59435fb1 3:8c578211a681
1 # WARNING before you start
2 # Install this tool on a private Galaxy ONLY
3 # Please NEVER on a public or production instance
4
5 *Short Story*
6 This is an unusual Galaxy tool that generates very simple new Galaxy tools that run the user
7 a supplied script (R, python, perl, bash...) over a single input file.
8 Whenever you run this tool, the ToolFactory, you should have prepared a script to paste into a text box,
9 and a small test input example ready to select from your history to test your new script
10
11 If the script runs sucessfully, a new Galaxy tool that runs your script can be generated.
12 The new tool is in the form of a special new Galaxy datatype - toolshed.gz - as the name suggests,
13 it's an archive ready to upload to a Galaxy ToolShed as a new tool repository.
14
15 Once it's in a ToolShed, it can be installed into any local Galaxy server from
16 the server administrative interface.
17
18 Once your new tool is installed, local users can run it - each time, the script that was supplied
19 when it was built will be executed with the input chosen from the user's history. In other words,
20 the tools you generate with the ToolFactory run just like any other Galaxy tool,
21 but run your script every time.
22
23 *Reasons to read further*
24
25 If you use Galaxy to support your research;
26
27 You and fellow users are sometimes forced to take data out of Galaxy, process it with ugly
28 little perl/awk/sed/R... scripts and put it back;
29
30 You do this when you can't do some transformation in Galaxy (the 90/10 rule);
31
32 You don't have enough developer resources for wrapping dozens of even relatively simple tools;
33
34 Your research and your institution would be far better off if those feral scripts were all tucked safely in
35 your local toolshed and Galaxy histories.
36
37 *The good news* If it can be trivially scripted, it can be running safely in your
38 local Galaxy via your own local toolshed in a few minutes - with functional tests.
39
40
41 *Value proposition* The ToolFactory allows Galaxy to efficiently take over most of your lab's dark script matter,
42 making it reproducible in Galaxy and shareable through the ToolShed.
43
44 That's what this tool does. You paste a simple script and the tool returns
45 a new, real Galaxy tool, ready to be installed from the local toolshed to local servers.
46 Scripts can be wrapped and online literally within minutes.
47
48 *To fully and safely exploit the awesome power* of this tool, Galaxy and the ToolShed,
49 you should be a developer installing this tool on a private/personal/scratch local instance where you are an admin_user.
50 Then, if you break it, you get to keep all the pieces
51 see https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
52
53 ** Installation **
54 This is a Galaxy tool. You can install it most conveniently using the administrative "Search and browse tool sheds" link.
55 Find the Galaxy Test toolshed (not main) and search for the toolfactory repository.
56 Open it and review the code and select the option to install it.
57
58 If you can't get the tool that way, the xml and py files here need to be copied into a new tools subdirectory such as tools/toolfactory
59 Your tool_conf.xml needs a new entry pointing to the xml file - something like::
60
61 <section name="Tool building tools" id="toolbuilders">
62 <tool file="toolfactory/rgToolFactory.xml"/>
63 </section>
64
65 If not already there (I just added it to datatypes_conf.xml.sample), please add:
66 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" mimetype="multipart/x-gzip" subclass="True" />
67 to your local data_types_conf.xml.
68
69 Ensure that html sanitization is set to False and uncommented in universe_wsgi.ini
70
71 You'll have to restart the server for the new tool to be available.
72
73 Of course, R, python, perl etc are needed on your path if you want to test scripts using those interpreters.
74 Adding new ones to this tool code should be easy enough. Please make suggestions as bitbucket issues and code.
75 The HTML file code automatically shrinks R's bloated pdfs, and depends on ghostscript. The thumbnails require imagemagick .
76
77 * Restricted execution *
78 The new tool factory tool will then be usable ONLY by admin users - people with IDs in admin_users in universe_wsgi.ini
79 **Yes, that's right. ONLY admin_users can run this tool** Think about it for a moment. If allowed to run any
80 arbitrary script on your Galaxy server, the only thing that would impede a miscreant bent on destroying all your
81 Galaxy data would probably be lack of appropriate technical skills.
82
83 *What it does* This is a tool factory for simple scripts in python, R and perl currently.
84 Functional tests are automatically generated. How cool is that.
85
86 LIMITED to simple scripts that read one input from the history.
87 Optionally can write one new history dataset,
88 and optionally collect any number of outputs into links on an autogenerated HTML
89 index page for the user to navigate - useful if the script writes images and output files - pdf outputs
90 are shown as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and imagemagik need to
91 be avaailable.
92
93 Generated tools can be edited and enhanced like any Galaxy tool, so start small and build up since
94 a generated script gets you a serious leg up to a more complex one.
95
96 *What you do* You paste and run your script
97 you fix the syntax errors and eventually it runs
98 You can use the redo button and edit the script before
99 trying to rerun it as you debug - it works pretty well.
100
101 Once the script works on some test data, you can
102 generate a toolshed compatible gzip file
103 containing your script ready to run as an ordinary Galaxy tool in a
104 repository on your local toolshed. That means safe and largely automated installation in any
105 production Galaxy configured to use your toolshed.
106
107 *Generated tool Security* Once you install a generated tool, it's just
108 another tool - assuming the script is safe. They just run normally and their user cannot do anything unusually insecure
109 but please, practice safe toolshed.
110 Read the fucking code before you install any tool.
111 Especially this one - it is really scary.
112
113 If you opt for an HTML output, you get all the script outputs arranged
114 as a single Html history item - all output files are linked, thumbnails for all the pdfs.
115 Ugly but really inexpensive.
116
117 Patches and suggestions welcome as bitbucket issues please?
118
119 long route to June 2012 product
120 derived from an integrated script model
121 called rgBaseScriptWrapper.py
122 Note to the unwary:
123 This tool allows arbitrary scripting on your Galaxy as the Galaxy user
124 There is nothing stopping a malicious user doing whatever they choose
125 Extremely dangerous!!
126 Totally insecure. So, trusted users only
127
128
129
130
131 copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012
132
133 all rights reserved
134 Licensed under the LGPL if you want to improve it, feel free https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
135
136 Material for our more enthusiastic and voracious readers continues below - we salute you.
137
138 **Motivation** Simple transformation, filtering or reporting scripts get written, run and lost every day in most busy labs
139 - even ours where Galaxy is in use. This 'dark script matter' is pervasive and generally not reproducible.
140
141 **Benefits** For our group, this allows Galaxy to fill that important dark script gap - all those "small" bioinformatics
142 tasks. Once a user has a working R (or python or perl) script that does something Galaxy cannot currently do (eg transpose a
143 tabular file) and takes parameters the way Galaxy supplies them (see example below), they:
144
145 1. Install the tool factory on a personal private instance
146
147 2. Upload a small test data set
148
149 3. Paste the script into the 'script' text box and iteratively run the insecure tool on test data until it works right -
150 there is absolutely no reason to do this anywhere other than on a personal private instance.
151
152 4. Once it works right, set the 'Generate toolshed gzip' option and run it again.
153
154 5. A toolshed style gzip appears ready to upload and install like any other Toolshed entry.
155
156 6. Upload the new tool to the toolshed
157
158 7. Ask the local admin to check the new tool to confirm it's not evil and install it in the local production galaxy
159
160 **Simple examples on the tool form**
161
162 A simple Rscript "filter" showing how the command line parameters can be handled, takes an input file,
163 does something (transpose in this case) and writes the results to a new tabular file::
164
165 # transpose a tabular input file and write as a tabular output file
166 ourargs = commandArgs(TRUE)
167 inf = ourargs[1]
168 outf = ourargs[2]
169 inp = read.table(inf,head=F,row.names=NULL,sep='\t')
170 outp = t(inp)
171 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=F)
172
173 Calculate a multiple test adjusted p value from a column of p values - for this script to be useful,
174 it needs the right column for the input to be specified in the code for the
175 given input file type(s) specified when the tool is generated ::
176
177 # use p.adjust - assumes a HEADER row and column 1 - please fix for any real use
178 column = 1 # adjust if necessary for some other kind of input
179 fdrmeth = 'BH'
180 ourargs = commandArgs(TRUE)
181 inf = ourargs[1]
182 outf = ourargs[2]
183 inp = read.table(inf,head=T,row.names=NULL,sep='\t')
184 p = inp[,column]
185 q = p.adjust(p,method=fdrmeth)
186 newval = paste(fdrmeth,'p-value',sep='_')
187 q = data.frame(q)
188 names(q) = newval
189 outp = cbind(inp,newval=q)
190 write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=T)
191
192
193
194 Another Rscript example without any input file - generates a random heatmap pdf - you must make sure the option to create an HTML output file is
195 turned on for this to work. The heatmap will be presented as a thumbnail linked to the pdf in the resulting HTML page::
196
197 # note this script takes NO input or output because it generates random data
198 foo = data.frame(a=runif(100),b=runif(100),c=runif(100),d=runif(100),e=runif(100),f=runif(100))
199 bar = as.matrix(foo)
200 pdf( "heattest.pdf" )
201 heatmap(bar,main='Random Heatmap')
202 dev.off()
203
204 A Python example that reverses each row of a tabular file. You'll need to remove the leading spaces for this to work if cut
205 and pasted into the script box. Note that you can already do this in Galaxy by setting up the cut columns tool with the
206 correct number of columns in reverse order,but this script will work for any number of columns so is completely generic::
207
208 # reverse order of columns in a tabular file
209 import sys
210 inp = sys.argv[1]
211 outp = sys.argv[2]
212 i = open(inp,'r')
213 o = open(outp,'w')
214 for row in i:
215 rs = row.rstrip().split('\t')
216 rs.reverse()
217 o.write('\t'.join(rs))
218 o.write('\n')
219 i.close()
220 o.close()
221
222
223 **Attribution** Copyright Ross Lazarus (ross period lazarus at gmail period com) May 2012
224
225 All rights reserved.
226
227 Licensed under the LGPL
228
229
230 **Obligatory screenshot**
231
232 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png
233