diff fubar-galaxytoolfactory-ca7db160878a/README.txt @ 3:8c578211a681 draft

Fixed nasty silly bug - fixed locally but not previously propogated
author fubar
date Fri, 31 Aug 2012 23:04:13 -0400
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/fubar-galaxytoolfactory-ca7db160878a/README.txt	Fri Aug 31 23:04:13 2012 -0400
@@ -0,0 +1,233 @@
+# WARNING before you start
+# Install this tool on a private Galaxy ONLY
+# Please NEVER on a public or production instance
+
+*Short Story*
+This is an unusual Galaxy tool that generates very simple new Galaxy tools that run the user 
+a supplied script (R, python, perl, bash...) over a single input file.
+Whenever you run this tool, the ToolFactory, you should have prepared a script to paste into a text box,
+and a small test input example ready to select from your history to test your new script
+
+If the script runs sucessfully, a new Galaxy tool that runs your script can be generated.
+The new tool is in the form of a special new Galaxy datatype - toolshed.gz - as the name suggests,
+it's an archive ready to upload to a Galaxy ToolShed as a new tool repository.
+
+Once it's in a ToolShed, it can be installed into any local Galaxy server from
+the server administrative interface.
+
+Once your new tool is installed, local users can run it - each time, the script that was supplied
+when it was built will be executed with the input chosen from the user's history. In other words,
+the tools you generate with the ToolFactory run just like any other Galaxy tool,
+but run your script every time.
+
+*Reasons to read further*
+
+If you use Galaxy to support your research;
+
+You and fellow users are sometimes forced to take data out of Galaxy, process it with ugly
+little perl/awk/sed/R... scripts and put it back;
+
+You do this when you can't do some transformation in Galaxy (the 90/10 rule);
+
+You don't have enough developer resources for wrapping dozens of even relatively simple tools;
+
+Your research and your institution would be far better off if those feral scripts were all tucked safely in
+your local toolshed and Galaxy histories.
+
+*The good news* If it can be trivially scripted, it can be running safely in your
+local Galaxy via your own local toolshed in a few minutes - with functional tests.
+
+
+*Value proposition* The ToolFactory allows Galaxy to efficiently take over most of your lab's dark script matter,
+making it reproducible in Galaxy and shareable through the ToolShed.
+
+That's what this tool does. You paste a simple script and the tool returns 
+a new, real Galaxy tool, ready to be installed from the local toolshed to local servers.
+Scripts can be wrapped and online literally within minutes.
+
+*To fully and safely exploit the awesome power* of this tool, Galaxy and the ToolShed,
+you should be a developer installing this tool on a private/personal/scratch local instance where you are an admin_user.
+Then, if you break it, you get to keep all the pieces
+see https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
+
+** Installation **
+This is a Galaxy tool. You can install it most conveniently using the administrative "Search and browse tool sheds" link.
+Find the Galaxy Test toolshed (not main) and search for the toolfactory repository.
+Open it and review the code and select the option to install it.
+
+If you can't get the tool that way, the xml and py files here need to be copied into a new tools subdirectory such as tools/toolfactory
+Your tool_conf.xml needs a new entry pointing to the xml file - something like::
+
+  <section name="Tool building tools" id="toolbuilders">
+    <tool file="toolfactory/rgToolFactory.xml"/>
+  </section>
+
+If not already there (I just added it to datatypes_conf.xml.sample), please add:
+<datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" mimetype="multipart/x-gzip" subclass="True" />
+to your local data_types_conf.xml. 
+
+Ensure that html sanitization is set to False and uncommented in universe_wsgi.ini
+
+You'll have to restart the server for the new tool to be available.
+
+Of course, R, python, perl etc are needed on your path if you want to test scripts using those interpreters.
+Adding new ones to this tool code should be easy enough. Please make suggestions as bitbucket issues and code.
+The HTML file code automatically shrinks R's bloated pdfs, and depends on ghostscript. The thumbnails require imagemagick .
+
+* Restricted execution *
+The new tool factory tool will then be usable ONLY by admin users - people with IDs in admin_users in universe_wsgi.ini
+**Yes, that's right. ONLY admin_users can run this tool** Think about it for a moment. If allowed to run any
+arbitrary script on your Galaxy server, the only thing that would impede a miscreant bent on destroying all your
+Galaxy data would probably be lack of appropriate technical skills.
+
+*What it does* This is a tool factory for simple scripts in python, R and perl currently. 
+Functional tests are automatically generated. How cool is that. 
+
+LIMITED to simple scripts that read one input from the history.
+Optionally can write one new history dataset,
+and optionally collect any number of outputs into links on an autogenerated HTML
+index page for the user to navigate - useful if the script writes images and output files - pdf outputs
+are shown as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and imagemagik need to
+be avaailable.
+
+Generated tools can be edited and enhanced like any Galaxy tool, so start small and build up since
+a generated script gets you a serious leg up to a more complex one.
+
+*What you do* You paste and run your script
+you fix the syntax errors and eventually it runs
+You can use the redo button and edit the script before
+trying to rerun it as you debug - it works pretty well.
+
+Once the script works on some test data, you can
+generate a toolshed compatible gzip file
+containing your script ready to run as an ordinary Galaxy tool in a
+repository on your local toolshed. That means safe and largely automated installation in any
+production Galaxy configured to use your toolshed.
+
+*Generated tool Security* Once you install a generated tool, it's just
+another tool - assuming the script is safe. They just run normally and their user cannot do anything unusually insecure
+but please, practice safe toolshed.
+Read the fucking code before you install any tool. 
+Especially this one - it is really scary.
+
+If you opt for an HTML output, you get all the script outputs arranged
+as a single Html history item - all output files are linked, thumbnails for all the pdfs.
+Ugly but really inexpensive.
+
+Patches and suggestions welcome as bitbucket issues please? 
+
+long route to June 2012 product
+derived from an integrated script model  
+called rgBaseScriptWrapper.py
+Note to the unwary:
+  This tool allows arbitrary scripting on your Galaxy as the Galaxy user
+  There is nothing stopping a malicious user doing whatever they choose
+  Extremely dangerous!!
+  Totally insecure. So, trusted users only
+
+
+
+
+copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012
+
+all rights reserved
+Licensed under the LGPL if you want to improve it, feel free https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
+
+Material for our more enthusiastic and voracious readers continues below - we salute you.
+
+**Motivation** Simple transformation, filtering or reporting scripts get written, run and lost every day in most busy labs 
+- even ours where Galaxy is in use. This 'dark script matter' is pervasive and generally not reproducible.
+
+**Benefits** For our group, this allows Galaxy to fill that important dark script gap - all those "small" bioinformatics 
+tasks. Once a user has a working R (or python or perl) script that does something Galaxy cannot currently do (eg transpose a 
+tabular file) and takes parameters the way Galaxy supplies them (see example below), they:
+
+1. Install the tool factory on a personal private instance
+
+2. Upload a small test data set
+
+3. Paste the script into the 'script' text box and iteratively run the insecure tool on test data until it works right - 
+there is absolutely no reason to do this anywhere other than on a personal private instance. 
+
+4. Once it works right, set the 'Generate toolshed gzip' option and run it again. 
+
+5. A toolshed style gzip appears ready to upload and install like any other Toolshed entry. 
+
+6. Upload the new tool to the toolshed
+
+7. Ask the local admin to check the new tool to confirm it's not evil and install it in the local production galaxy
+
+**Simple examples on the tool form**
+
+A simple Rscript "filter" showing how the command line parameters can be handled, takes an input file, 
+does something (transpose in this case) and writes the results to a new tabular file::
+
+ # transpose a tabular input file and write as a tabular output file
+ ourargs = commandArgs(TRUE)
+ inf = ourargs[1]
+ outf = ourargs[2]
+ inp = read.table(inf,head=F,row.names=NULL,sep='\t')
+ outp = t(inp)
+ write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=F)
+
+Calculate a multiple test adjusted p value from a column of p values - for this script to be useful,
+it needs the right column for the input to be specified in the code for the
+given input file type(s) specified when the tool is generated ::
+
+ # use p.adjust - assumes a HEADER row and column 1 - please fix for any real use
+ column = 1 # adjust if necessary for some other kind of input
+ fdrmeth = 'BH'
+ ourargs = commandArgs(TRUE)
+ inf = ourargs[1]
+ outf = ourargs[2]
+ inp = read.table(inf,head=T,row.names=NULL,sep='\t')
+ p = inp[,column]
+ q = p.adjust(p,method=fdrmeth)
+ newval = paste(fdrmeth,'p-value',sep='_')
+ q = data.frame(q)
+ names(q) = newval
+ outp = cbind(inp,newval=q)
+ write.table(outp,outf, quote=FALSE, sep="\t",row.names=F,col.names=T) 
+
+
+
+Another Rscript example without any input file - generates a random heatmap pdf - you must make sure the option to create an HTML output file is
+turned on for this to work. The heatmap will be presented as a thumbnail linked to the pdf in the resulting HTML page::
+
+ # note this script takes NO input or output because it generates random data
+ foo = data.frame(a=runif(100),b=runif(100),c=runif(100),d=runif(100),e=runif(100),f=runif(100))
+ bar = as.matrix(foo)
+ pdf( "heattest.pdf" )
+ heatmap(bar,main='Random Heatmap')
+ dev.off()
+
+A Python example that reverses each row of a tabular file. You'll need to remove the leading spaces for this to work if cut
+and pasted into the script box. Note that you can already do this in Galaxy by setting up the cut columns tool with the
+correct number of columns in reverse order,but this script will work for any number of columns so is completely generic::
+
+# reverse order of columns in a tabular file
+import sys
+inp = sys.argv[1]
+outp = sys.argv[2]
+i = open(inp,'r')
+o = open(outp,'w')
+for row in i:
+    rs = row.rstrip().split('\t')
+    rs.reverse()
+    o.write('\t'.join(rs))
+    o.write('\n')
+i.close()
+o.close()
+
+
+**Attribution** Copyright Ross Lazarus (ross period lazarus at gmail period com) May 2012
+
+All rights reserved.
+
+Licensed under the LGPL
+
+
+**Obligatory screenshot**
+
+http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png
+