comparison toolfactory/README.md @ 49:35a912ce0c83 draft

Can now make the bwa example from planemo :)
author fubar
date Thu, 27 Aug 2020 23:11:01 -0400
parents ad564ab3cf7b
children 68fbdbe35f08
comparison
equal deleted inserted replaced
48:5a7a5b06bce0 49:35a912ce0c83
1 *WARNING before you start* 1 **Breaking news! Docker container is recommended as at August 2020**
2 2
3 Install this tool on a private Galaxy ONLY 3 A Docker container can be built - see the docker directory.
4 It is highly recommended for isolation. It also has an integrated toolshed to allow installation of new tools back
5 into the Galaxy being used to generate them.
6
7 Built from quay.io/bgruening/galaxy:20.05 but updates the
8 Galaxy code to the dev branch - it seems to work fine with updated bioblend>=0.14
9 with planemo and the right version of gxformat2 needed by the ToolFactory (TF).
10
11 The runclean.sh script run from the docker subdirectory of your local clone of this repository
12 should create a container (eventually) and serve it at localhost:8080 with a toolshed at
13 localhost:9009.
14
15 Once it's up, please restart Galaxy in the container with
16 ```docker exec [container name] supervisorctl restart galaxy: ```
17 Jobs just do not seem to run properly otherwise and the next steps won't work!
18
19 The generated container includes a workflow and 2 sample data sets for the workflow
20
21 Load the workflow. Adjust the inputs for each as labelled. The perl example counts GC in phiX.fasta.
22 The python scripts use the rgToolFactory.py as their input - any text file will work but I like the
23 recursion. The BWA example has some mitochondrial reads and reference. Run the workflow and watch.
24 This should fill the history with some sample tools you can rerun and play with.
25 Note that each new tool will have been tested using Planemo. In the workflow, in Galaxy.
26 Extremely cool to watch.
27
28 *WARNING*
29
30 Install this tool on a throw-away private Galaxy or Docker container ONLY
4 Please NEVER on a public or production instance 31 Please NEVER on a public or production instance
5
6 Updated august 2014 by John Chilton adding citation support
7 32
8 Updated august 8 2014 to fix bugs reported by Marius van den Beek 33 *Short Story*
9 34
10 Please cite the resource at 35 Galaxy is easily extended to new applications by adding a new tool. Each new scientific computational package added as
11 http://bioinformatics.oxfordjournals.org/cgi/reprint/bts573?ijkey=lczQh1sWrMwdYWJ&keytype=ref 36 a tool to Galaxy requires some special instructions to be written. This is sometimes termed "wrapping" the package
12 if you use this tool in your published work. 37 because the instructions tell Galaxy how to run the package as a new Galaxy tool. Any tool in a Galaxy is
38 readily available to all the users through a consistent and easy to use interface.
13 39
14 **Short Story** 40 Most Galaxy tool wrappers have been manually prepared by skilled programmers, many using Planemo because it
41 automates much of the basic boilerplate and makes the process much easier. The ToolFactory (TF)
42 uses Planemo under the hood for many functions, but hides the command
43 line complexities from the TF user.
15 44
16 This is an unusual Galaxy tool capable of generating new Galaxy tools. 45 *More Explanation*
17 It works by exposing *unrestricted* and therefore extremely dangerous scripting 46
47 The TF is an unusual Galaxy tool, designed to allow a skilled user to make new Galaxy tools.
48 It appears in Galaxy just like any other tool but outputs include new Galaxy tools generated
49 using instructions provided by the user and the results of Planemo lint and tool testing using
50 small sample inputs provided by the TF user. The small samples become tests built in to the new tool.
51
52 It offers a familiar Galaxy form driven way to define how the user of the new tool will
53 choose input data from their history, and what parameters the new tool user will be able to adjust.
54 The TF user must know, or be able to read, enough about the tool to be able to define the details of
55 the new Galaxy interface and the ToolFactory offers little guidance on that other than some examples.
56
57 Tools always depend on other things. Most tools in Galaxy depend on third party
58 scientific packages, so TF tools usually have one or more dependencies. These can be
59 scientific packages such as BWA or scripting languages such as Python and are
60 usually managed by Conda. If the new tool relies on a system utility such as bash or awk
61 where the importance of version control on reproducibility is low, these can be used without
62 Conda management - but remember the potential risks of unmanaged dependencies on computational
63 reproducibility.
64
65 The TF user can optionally supply a working script where scripting is
66 required and the chosen dependency is a scripting language such as Python or a system
67 scripting executable such as bash. Whatever the language, the script must correctly parse the command line
68 arguments it receives at tool execution, as they are defined by the TF user. The
69 text of that script is "baked in" to the new tool and will be executed each time
70 the new tool is run. It is highly recommended that scripts and their command lines be developed
71 and tested until proven to work before the TF is invoked. Galaxy as a software development
72 environment is actually possible, but not recommended being somewhat clumsy and inefficient.
73
74 Tools nearly always take one or more data sets from the user's history as input. TF tools
75 allow the TF user to define what Galaxy datatypes the tool end user will be able to choose and what
76 names or positions will be used to pass them on a command line to the package or script.
77
78 Tools often have various parameter settings. The TF allows the TF user to define how each
79 parameter will appear on the tool form to the end user, and what names or positions will be
80 used to pass them on the command line to the package. At present, parameters are limited to
81 simple text and number fields. Pull requests for other kinds of parameters that galaxyxml
82 can handle are welcomed.
83
84 Best practice Galaxy tools have one or more automated tests. These should use small sample data sets and
85 specific parameter settings so when the tool is tested, the outputs can be compared with their expected
86 values. The TF will automatically create a test for the new tool. It will use the sample data sets
87 chosen by the TF user when they built the new tool.
88
89 The TF works by exposing *unrestricted* and therefore extremely dangerous scripting
18 to all designated administrators of the host Galaxy server, allowing them to 90 to all designated administrators of the host Galaxy server, allowing them to
19 run scripts in R, python, sh and perl over multiple selected input data sets, 91 run scripts in R, python, sh and perl. For this reason, a Docker container is
20 writing a single new data set as output. 92 available to help manage the associated risks.
21 93
22 *You have a working r/python/perl/bash script or any executable with positional or argparse style parameters* 94 *Scripting uses*
23 95
24 It can be turned into an ordinary Galaxy tool in minutes, using a Galaxy tool. 96 To use a scripting language to create a new tool, you must first prepared and properly test a script. Use small sample
97 data sets for testing. When the script is working correctly, upload the small sample datasets
98 into a new history, start configuring a new ToolFactory tool, and paste the script into the script text box on the TF form.
25 99
26 100 *Outputs*
27 **Automated generation of new Galaxy tools for installation into any Galaxy**
28
29 A test is generated using small sample test data inputs and parameter settings you supply.
30 Once the test case outputs have been produced, they can be used to build a
31 new Galaxy tool. The supplied script or executable is baked as a requirement
32 into a new, ordinary Galaxy tool, fully workflow compatible out of the box.
33 Generated tools are installed via a tool shed by an administrator
34 and work exactly like all other Galaxy tools for your users.
35
36 **More Detail**
37
38 To use the ToolFactory, you should have prepared a script to paste into a
39 text box, or have a package in mind and a small test input example ready to select from your history
40 to test your new script.
41
42 ```planemo test rgToolFactory2.xml --galaxy_root ~/galaxy --test_data ~/galaxy/tools/tool_makers/toolfactory/test-data``` works for me
43
44 There is an example in each scripting language on the Tool Factory form. You
45 can just cut and paste these to try it out - remember to select the right
46 interpreter please. You'll also need to create a small test data set using
47 the Galaxy history add new data tool.
48
49 If the script fails somehow, use the "redo" button on the tool output in
50 your history to recreate the form complete with broken script. Fix the bug
51 and execute again. Rinse, wash, repeat.
52 101
53 Once the script runs sucessfully, a new Galaxy tool that runs your script 102 Once the script runs sucessfully, a new Galaxy tool that runs your script
54 can be generated. Select the "generate" option and supply some help text and 103 can be generated. Select the "generate" option and supply some help text and
55 names. The new tool will be generated in the form of a new Galaxy datatype 104 names. The new tool will be generated in the form of a new Galaxy datatype
56 *toolshed.gz* - as the name suggests, it's an archive ready to upload to a 105 *tgz* - as the name suggests, it's an archive ready to upload to a
57 Galaxy ToolShed as a new tool repository. 106 Galaxy ToolShed as a new tool repository.
107
108 It is also possible to run a tool to generate test outputs, then test it
109 using planemo. A toolshed is built in to the Docker container and configured
110 so a tool can be tested, sent to that toolshed, then installed in the Galaxy
111 where the TF is running.
112
113 If the tool requires a command or test XML override, then planemo is
114 needed to generate test outputs to make a complete tool, rerun to test
115 and if required upload to the local toolshed and install in the Galaxy
116 where the TF is running.
58 117
59 Once it's in a ToolShed, it can be installed into any local Galaxy server 118 Once it's in a ToolShed, it can be installed into any local Galaxy server
60 from the server administrative interface. 119 from the server administrative interface.
61 120
62 Once the new tool is installed, local users can run it - each time, the script 121 Once the new tool is installed, local users can run it - each time, the
63 that was supplied when it was built will be executed with the input chosen 122 package and/or script that was supplied when it was built will be executed with the input chosen
64 from the user's history. In other words, the tools you generate with the 123 from the user's history, together with user supplied parameters. In other words, the tools you generate with the
65 ToolFactory run just like any other Galaxy tool,but run your script every time. 124 ToolFactory run just like any other Galaxy tool.
66 125
67 Tool factory tools are perfect for workflow components. One input, one output, 126 TF generated tools work as normal workflow components.
68 no variables.
69 127
70 *To fully and safely exploit the awesome power* of this tool,
71 Galaxy and the ToolShed, you should be a developer installing this
72 tool on a private/personal/scratch local instance where you are an
73 admin_user. Then, if you break it, you get to keep all the pieces see
74 https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
75 128
76 **Installation** 129 *Limitations*
77 This is a Galaxy tool. You can install it most conveniently using the 130
131 The TF is flexible enough to generate wrappers for many common scientific packages
132 but the inbuilt automation will not cope with all possible situations. Users can
133 supply overrides for two tool XML segments - tests and command and the BWA
134 example in the supplied samples workflow illustrates their use.
135
136 *Installation*
137
138 The Docker container is the best way to use the TF because it is preconfigured
139 to automate new tool testing and has a built in local toolshed where each new tool
140 is uploaded. It is easy to install without Docker, but you will need to make some
141 configuration changes (TODO write a configuration). You can install it most conveniently using the
78 administrative "Search and browse tool sheds" link. Find the Galaxy Main 142 administrative "Search and browse tool sheds" link. Find the Galaxy Main
79 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory 143 toolshed at https://toolshed.g2.bx.psu.edu/ and search for the toolfactory
80 repository. Open it and review the code and select the option to install it. 144 repository in the Tool Maker section. Open it and review the code and select the option to install it.
81 145
82 If you can't get the tool that way, the xml and py files here need to be 146 Otherwise, if not already there pending an accepted PR,
83 copied into a new tools
84 subdirectory such as tools/toolfactory Your tool_conf.xml needs a new entry
85 pointing to the xml
86 file - something like::
87
88 <section name="Tool building tools" id="toolbuilders">
89 <tool file="toolfactory/rgToolFactory.xml"/>
90 </section>
91
92 If not already there,
93 please add: 147 please add:
94 <datatype extension="toolshed.gz" type="galaxy.datatypes.binary:Binary" 148 <datatype extension="tgz" type="galaxy.datatypes.binary:Binary"
95 mimetype="multipart/x-gzip" subclass="True" /> 149 mimetype="multipart/x-gzip" subclass="True" />
96 to your local data_types_conf.xml. 150 to your local data_types_conf.xml.
97 151
98 152
99 **Restricted execution** 153 *Restricted execution*
100 154
101 The tool factory tool itself will then be usable ONLY by admin users - 155 The tool factory tool itself will then be usable ONLY by admin users -
102 people with IDs in admin_users in universe_wsgi.ini **Yes, that's right. ONLY 156 people with IDs in admin_users. **Yes, that's right. ONLY
103 admin_users can run this tool** Think about it for a moment. If allowed to 157 admin_users can run this tool** Think about it for a moment. If allowed to
104 run any arbitrary script on your Galaxy server, the only thing that would 158 run any arbitrary script on your Galaxy server, the only thing that would
105 impede a miscreant bent on destroying all your Galaxy data would probably 159 impede a miscreant bent on destroying all your Galaxy data would probably
106 be lack of appropriate technical skills. 160 be lack of appropriate technical skills.
107
108 **What it does**
109
110 This is a tool factory for simple scripts in python, R and
111 perl currently. Functional tests are automatically generated. How cool is that.
112
113 LIMITED to simple scripts that read one input from the history. Optionally can
114 write one new history dataset, and optionally collect any number of outputs
115 into links on an autogenerated HTML index page for the user to navigate -
116 useful if the script writes images and output files - pdf outputs are shown
117 as thumbnails and R's bloated pdf's are shrunk with ghostscript so that and
118 imagemagik need to be available.
119
120 Generated tools can be edited and enhanced like any Galaxy tool, so start
121 small and build up since a generated script gets you a serious leg up to a
122 more complex one.
123
124 **What you do**
125
126 You paste and run your script, you fix the syntax errors and
127 eventually it runs. You can use the redo button and edit the script before
128 trying to rerun it as you debug - it works pretty well.
129
130 Once the script works on some test data, you can generate a toolshed compatible
131 gzip file containing your script ready to run as an ordinary Galaxy tool in
132 a repository on your local toolshed. That means safe and largely automated
133 installation in any production Galaxy configured to use your toolshed.
134 161
135 **Generated tool Security** 162 **Generated tool Security**
136 163
137 Once you install a generated tool, it's just 164 Once you install a generated tool, it's just
138 another tool - assuming the script is safe. They just run normally and their 165 another tool - assuming the script is safe. They just run normally and their
139 user cannot do anything unusually insecure but please, practice safe toolshed. 166 user cannot do anything unusually insecure but please, practice safe toolshed.
140 Read the code before you install any tool. Especially this one - it is really scary. 167 Read the code before you install any tool. Especially this one - it is really scary.
141 168
142 **Send Code** 169 **Send Code**
143 170
144 Patches and suggestions welcome as bitbucket issues please? 171 Pull requests and suggestions welcome as git issues please?
145 172
146 **Attribution** 173 **Attribution**
147 174
148 Creating re-usable tools from scripts: The Galaxy Tool Factory 175 Creating re-usable tools from scripts: The Galaxy Tool Factory
149 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team 176 Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team
158 185
159 All rights reserved. 186 All rights reserved.
160 187
161 Licensed under the LGPL 188 Licensed under the LGPL
162 189
163 **Obligatory screenshot**
164
165 http://bitbucket.org/fubar/galaxytoolmaker/src/fda8032fe989/images/dynamicScriptTool.png
166