diff toolfactory/README.md @ 49:35a912ce0c83 draft

Can now make the bwa example from planemo :)
author fubar
date Thu, 27 Aug 2020 23:11:01 -0400
parents ad564ab3cf7b
children 68fbdbe35f08
line wrap: on
line diff
--- a/toolfactory/README.md	Sun Aug 23 21:03:48 2020 -0400
+++ b/toolfactory/README.md	Thu Aug 27 23:11:01 2020 -0400
@@ -1,137 +1,164 @@
-*WARNING before you start*
+**Breaking news! Docker container is recommended as at August 2020**
+
+A Docker container can be built - see the docker directory.
+It is highly recommended for isolation. It also has an integrated toolshed to allow installation of new tools back 
+into the Galaxy being used to generate them. 
+
+Built from quay.io/bgruening/galaxy:20.05 but updates the
+Galaxy code to the dev branch - it seems to work fine with updated bioblend>=0.14
+with planemo and the right version of gxformat2 needed by the ToolFactory (TF).
 
- Install this tool on a private Galaxy ONLY
- Please NEVER on a public or production instance
- 
-Updated august 2014 by John Chilton adding citation support
+The runclean.sh script run from the docker subdirectory of your local clone of this repository
+should create a container (eventually) and serve it at localhost:8080 with a toolshed at
+localhost:9009.
 
-Updated august 8 2014 to fix bugs reported by Marius van den Beek
+Once it's up, please restart Galaxy in the container with 
+```docker exec [container name] supervisorctl restart galaxy: ```
+Jobs just do not seem to run properly otherwise and the next steps won't work!
 
-Please cite the resource at
-http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref
-if you use this tool in your published work.
+The generated container includes a workflow and 2 sample data sets for the workflow
 
-**Short Story**
+Load the workflow. Adjust the inputs for each as labelled. The perl example counts GC in phiX.fasta. 
+The python scripts use the rgToolFactory.py as their input - any text file will work but I like the
+recursion. The BWA example has some mitochondrial reads and reference. Run the workflow and watch.
+This should fill the history with some sample tools you can rerun and play with.
+Note that each new tool will have been tested using Planemo. In the workflow, in Galaxy.
+Extremely cool to watch.
+
+*WARNING* 
+
+ Install this tool on a throw-away private Galaxy or Docker container ONLY
+ Please NEVER on a public or production instance
 
-This is an unusual Galaxy tool capable of generating new Galaxy tools.
-It works by exposing *unrestricted* and therefore extremely dangerous scripting
-to all designated administrators of the host Galaxy server, allowing them to
-run scripts in R, python, sh and perl over multiple selected input data sets,
-writing a single new data set as output.
+*Short Story*
+
+Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as
+a tool to Galaxy requires some special instructions to be written. This is sometimes termed "wrapping" the package
+because the instructions tell Galaxy how to run the package as a new Galaxy tool. Any tool in a Galaxy is 
+readily available to all the users through a consistent and easy to use interface.
 
-*You have a working r/python/perl/bash script or any executable with positional or argparse style parameters*
+Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it 
+automates much of the basic boilerplate and makes the process much easier. The ToolFactory (TF) 
+uses Planemo under the hood for many functions, but hides the command
+line complexities from the TF user. 
 
-It can be turned into an ordinary Galaxy tool in minutes, using a Galaxy tool.
-
+*More Explanation*
 
-**Automated generation of new Galaxy tools for installation into any Galaxy**
+The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools. 
+It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated
+using instructions provided by the user and the results of Planemo lint and tool testing using
+small sample inputs provided by the TF user. The small samples become tests built in to the new tool.
+
+It offers a familiar Galaxy form driven way to define how the user of the new tool will 
+choose input data from their history, and what parameters the new tool user will be able to adjust.
+The TF user must know, or be able to read, enough about the tool to be able to define the details of
+the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples.
 
-A test is generated using small sample test data inputs and parameter settings you supply.
-Once the test case outputs have been produced, they can be used to build a
-new Galaxy tool. The supplied script or executable is baked as a requirement
-into a new, ordinary Galaxy tool, fully workflow compatible out of the box.
-Generated tools are installed via a tool shed by an administrator
-and work exactly like all other Galaxy tools for your users.
+Tools always depend on other things. Most tools in Galaxy depend on third party
+scientific packages, so TF tools usually have one or more dependencies. These can be
+scientific packages such as BWA or scripting languages such as Python and are
+usually managed by Conda. If the new tool relies on a system utility such as bash or awk 
+where the importance of version control on reproducibility is low, these can be used without 
+Conda management - but remember the potential risks of unmanaged dependencies on computational
+reproducibility.
 
-**More Detail**
+The TF user can optionally supply a working script where scripting is
+required and the chosen dependency is a scripting language such as Python or a system
+scripting executable such as bash. Whatever the language, the script must correctly parse the command line
+arguments it receives at tool execution, as they are defined by the TF user. The
+text of that script is "baked in" to the new tool and will be executed each time
+the new tool is run. It is highly recommended that scripts and their command lines be developed
+and tested until proven to work before the TF is invoked. Galaxy as a software development
+environment is actually possible, but not recommended being somewhat clumsy and inefficient.
 
-To use the ToolFactory, you should have prepared a script to paste into a
-text box, or have a package in mind and a small test input example ready to select from your history
-to test your new script.
+Tools nearly always take one or more data sets from the user's history as input. TF tools
+allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what 
+names or positions will be used to pass them on a command line to the package or script.
 
-```planemo test rgToolFactory2.xml --galaxy_root ~/galaxy --test_data ~/galaxy/tools/tool_makers/toolfactory/test-data``` works for me
+Tools often have various parameter settings. The TF allows the TF user to define how each
+parameter will appear on the tool form to the end user, and what names or positions will be
+used to pass them on the command line to the package. At present, parameters are limited to
+simple text and number fields. Pull requests for other kinds of parameters that galaxyxml
+can handle are welcomed.
 
-There is an example in each scripting language on the Tool Factory form. You
-can just cut and paste these to try it out - remember to select the right
-interpreter please. You'll also need to create a small test data set using
-the Galaxy history add new data tool.
+Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and
+specific parameter settings so when the tool is tested, the outputs can be compared with their expected
+values. The TF will automatically create a test for the new tool. It will use the sample data sets 
+chosen by the TF user when they built the new tool.
 
-If the script fails somehow, use the "redo" button on the tool output in
-your history to recreate the form complete with broken script. Fix the bug
-and execute again. Rinse, wash, repeat.
+The TF works by exposing *unrestricted* and therefore extremely dangerous scripting
+to all designated administrators of the host Galaxy server, allowing them to
+run scripts in R, python, sh and perl. For this reason, a Docker container is
+available to help manage the associated risks.
+
+*Scripting uses*
+
+To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample
+data sets for testing. When the script is working correctly, upload the small sample datasets
+into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form.
+
+*Outputs*
 
 Once the script runs sucessfully, a new Galaxy tool that runs your script
 can be generated. Select the "generate" option and supply some help text and
 names. The new tool will be generated in the form of a new Galaxy datatype
-*toolshed.gz* - as the name suggests, it's an archive ready to upload to a
+*tgz* - as the name suggests, it's an archive ready to upload to a
 Galaxy ToolShed as a new tool repository.
 
+It is also possible to run a tool to generate test outputs, then test it
+using planemo. A toolshed is built in to the Docker container and configured
+so a tool can be tested, sent to that toolshed, then installed in the Galaxy
+where the TF is running.
+
+If the tool requires a command or test XML override, then planemo is 
+needed to generate test outputs to make a complete tool, rerun to test 
+and if required upload to the local toolshed and install in the Galaxy 
+where the TF is running.
+
 Once it's in a ToolShed, it can be installed into any local Galaxy server
 from the server administrative interface.
 
-Once the new tool is installed, local users can run it - each time, the script
-that was supplied when it was built will be executed with the input chosen
-from the user's history. In other words, the tools you generate with the
-ToolFactory run just like any other Galaxy tool,but run your script every time.
+Once the new tool is installed, local users can run it - each time, the 
+package and/or script that was supplied when it was built will be executed with the input chosen
+from the user's history, together with user supplied parameters. In other words, the tools you generate with the
+ToolFactory run just like any other Galaxy tool.
 
-Tool factory tools are perfect for workflow components. One input, one output,
-no variables.
+TF generated tools work as normal workflow components.
+
+
+*Limitations*
 
-*To fully and safely exploit the awesome power* of this tool,
-Galaxy and the ToolShed, you should be a developer installing this
-tool on a private/personal/scratch local instance where you are an
-admin_user. Then, if you break it, you get to keep all the pieces see
-https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
+The TF is flexible enough to generate wrappers for many common scientific packages
+but the inbuilt automation will not cope with all possible situations. Users can
+supply overrides for two tool XML segments - tests and command and the BWA
+example in the supplied samples workflow illustrates their use.  
 
-**Installation**
-This is a Galaxy tool. You can install it most conveniently using the
+*Installation*
+
+The Docker container is the best way to use the TF because it is preconfigured
+to automate new tool testing and has a built in local toolshed where each new tool
+is uploaded. It is easy to install without Docker, but you will need to make some 
+configuration changes (TODO write a configuration). You can install it most conveniently using the
 administrative "Search and browse tool sheds" link. Find the Galaxy Main
 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
-repository. Open it and review the code and select the option to install it.
-
-If you can't get the tool that way, the xml and py files here need to be
-copied into a new tools
-subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry
-pointing to the xml
-file - something like::
+repository in the Tool Maker section. Open it and review the code and select the option to install it.
 
-  <section name="Tool building tools" id="toolbuilders">
-    <tool file="toolfactory/rgToolFactory.xml"/>
-  </section>
-
-If not already there,
+Otherwise, if not already there pending an accepted PR,
 please add:
-<datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary"
+<datatype extension="tgz" type="galaxy.datatypes.binary:Binary"
 mimetype="multipart/x-gzip" subclass="True" />
 to your local data_types_conf.xml.
 
 
-**Restricted execution**
+*Restricted execution*
 
 The tool factory tool itself will then be usable ONLY by admin users -
-people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY
+people with IDs in admin_users. **Yes, that's right. ONLY
 admin_users can run this tool** Think about it for a moment. If allowed to
 run any arbitrary script on your Galaxy server, the only thing that would
 impede a miscreant bent on destroying all your Galaxy data would probably
 be lack of appropriate technical skills.
 
-**What it does** 
-
-This is a tool factory for simple scripts in python, R and
-perl currently. Functional tests are automatically generated. How cool is that.
-
-LIMITED to simple scripts that read one input from the history. Optionally can
-write one new history dataset, and optionally collect any number of outputs
-into links on an autogenerated HTML index page for the user to navigate -
-useful if the script writes images and output files - pdf outputs are shown
-as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and
-imagemagik need to be available.
-
-Generated tools can be edited and enhanced like any Galaxy tool, so start
-small and build up since a generated script gets you a serious leg up to a
-more complex one.
-
-**What you do**
-
-You paste and run your script, you fix the syntax errors and
-eventually it runs. You can use the redo button and edit the script before
-trying to rerun it as you debug - it works pretty well.
-
-Once the script works on some test data, you can generate a toolshed compatible
-gzip file containing your script ready to run as an ordinary Galaxy tool in
-a repository on your local toolshed. That means safe and largely automated
-installation in any production Galaxy configured to use your toolshed.
-
 **Generated tool Security**
 
 Once you install a generated tool, it's just
@@ -141,7 +168,7 @@
 
 **Send Code**
 
-Patches and suggestions welcome as bitbucket issues please?
+Pull requests and suggestions welcome as git issues please?
 
 **Attribution**
 
@@ -160,7 +187,3 @@
 
 Licensed under the LGPL
 
-**Obligatory screenshot**
-
-http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png
-