diff readme.md @ 11:fc56e75d8b14 draft

"planemo upload for repository https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/openms commit 020906fb54bde7fc143c356f41975c378a741315"
author galaxyp
date Wed, 09 Sep 2020 12:43:02 +0000
parents d50b7bb0f027
children d9ebdc2e55fe
line wrap: on
line diff
--- a/readme.md	Fri May 17 09:50:16 2019 -0400
+++ b/readme.md	Wed Sep 09 12:43:02 2020 +0000
@@ -8,175 +8,127 @@
 More informations are available at:
 
  * https://github.com/OpenMS/OpenMS
- * http://open-ms.sourceforge.net
+ * https://www.openms.de/
+
+The wrappers for these tools and most of their tests are automatically
+generated using the `generate.sh` script. The generation of the tools is
+based on the CTDConverter (https://github.com/WorkflowConversion/CTDConverter)
+which can be fine tuned via the `hardcoded_params.json` file. This file allows
+to blacklist and hardcode parameters and to modify or set arbitrary
+CTD/XML attributes.
+
+Note that, due to its size, the test data is excluded from this repository. In
+order to generate the test data on call `test-data.sh`.
+
+Manual updates should only be done to
+
+- the `@GALAXY_VERSION@"` token in `macros.xml`
+- and the manually contributed tests in `macros_test.xml` (The goal is that all
+  tools that do not have an automatically generated test are covered here)
+- the `hardcoded_params.json` files
+
+In a few cases patches may be acceptable.
+
+Installation
+============
+
+The Galaxy OpenMS tools can be installed from the toolshed. While most tools
+will work out of the box some need attention since requirements can not be
+fulfilled via Conda:
+
+Not yet in Conda are:
+
+- SpectraST (http://tools.proteomecenter.org/wiki/index.php?title=SpectraST)
+- MaRaCluster (https://github.com/statisticalbiotechnology/maracluster)
+
+Binaries for these tools can easily be obtained via: 
+
+```
+VERSION=....
+git git clone -b release/$VERSION.0 https://github.com/OpenMS/OpenMS.git OpenMS$VERSION.0-git
+git submodule init OpenMS$VERSION.0-git
+git submodule update OpenMS$VERSION.0-git
+```
+
+They are located in `OpenMS$VERSION-git/THIRDPARTY/`. 
 
+Not in Conda due to licencing restrictions:
+
+- Mascot http://www.matrixscience.com/
+- MSFragger https://github.com/Nesvilab/MSFragger
+- Novor http://www.rapidnovor.org/novor
+
+There are multiple ways to enable the Galaxy tools to use these binaries. 
+
+- Just copy them to the `bin` path within Galaxy's conda environment
+- Put them in any other path that that is included in PATH
+- Edit the corresponding tools: In the command line part search for the parameters `-executable`, `-maracluster_executable`, or `-mascot_directory` and edit them appropriately.
+
+Working
+=======
+
+The tools work by:
+
+Preprocessing:
+
+- For each input / output data set parameter a directory is crated (named by
+  the parameter)
+- For input data set parameters the links to the actual location of the data
+  sets are created
+
+Main:
+
+- The galaxy wrapper create two json config files: one containing the
+  parameters and the values chosen by the user and the other the values of
+  hardcoded parameters.
+- With `OpenMSTool -write_ctd ./` a CTD (names OpenMSTool.ctd) file is
+  generated that contains the default values.
+- A call to `fill_ctd.py` fills in the values from the json config files into
+  the CTD file
+- The actual tool is called `OpenMSTool -ini OpenMSTool.ctd` and also all input
+  and output parameters are given on the command line.
+
+Postprocessing:
+
+- output data sets are moved to the final locations
+
+Note: The reason for handling data sets on the command line (and not specifying
+them in the CTD file) is mainly that all files in Galaxy have the extension
+`.dat` and OpenMS tools require an appropriate extension. But this may change
+in the future.
 
 Generating OpenMS wrappers
 ==========================
 
- * install OpenMS (you can do this automatically through Conda)
- * create a folder called CTD
- * if you installed openms as a binary in a specific directory, execute the following command in the `openms/bin` directory:
-    
-    ```bash
-    for binary in `ls`; do ./$binary -write_ctd /PATH/TO/YOUR/CTD; done;
-    ```
-    
- * if there is no binary release (e.g. as with version 2.2), download and unpack the Conda package, find the `bin` folder and create a list of the tools as follow:
- 
-    ```bash
-    ls >> tools.txt
-    ```
-    
- * search for the `bin` folder of your conda environment containing OpenMS and do:
- 
-    ```bash
-    while read p; do
-        ./PATH/TO/BIN/$p -write_ctd /PATH/TO/YOUR/CTD;
-    done <tools.txt
-    ```
-    
- * You should have all CTD files now. `MetaProSIP.ctd` includes a not supported character: To use it, search for `²` and replace it (e.g. with `^2`).
+1. remove old test data: `rm -rf $(ls -d test-data/* | egrep -v "random|\.loc")`
+2. `./generate.sh`
 
- * clone or install CTDopts
-
-    ```bash
-    git clone https://github.com/genericworkflownodes/CTDopts
-    ```
-
- * add CTDopts to your `$PYTHONPATH`
-
-    ```bash
-    export PYTHONPATH=/home/user/CTDopts/
-    ```
-
- * clone or install CTD2Galaxy
+Whats happening:
 
-    ```bash
-    git clone https://github.com/WorkflowConversion/CTDConverter.git
-    ```
-    
- * If you have CTDopts and CTDConverter installed you are ready to generate Galaxy Tools from CTD definitions. Change the following command according to your needs, especially the `/PATH/TO` parts. The default files are provided in this repository. You might have to install `libxslt` and `lxml` to run it. Further information can be found on the CTDConverter page.
+1. The binaries of the OpenMS package can generate a CTD file that describes
+   the parameters. These CTD files are converted to xml Galaxy tool descriptions
+   using the `CTDConverter`.
 
-    ```bash
-    python convert.py galaxy \ 
-    -i /PATH/TO/YOUR/CTD/*.ctd \
-    -o ./PATH/TO/YOUR/WRAPPERS/ -t tool.conf \
-    -d datatypes_conf.xml -g openms \
-    -b version log debug test no_progress threads \
-     in_type executable myrimatch_executable \
-     fido_executable fidocp_executable \
-     omssa_executable pepnovo_e xecutable \
-     xtandem_executable param_model_directory \
-     java_executable java_memory java_permgen \
-     r_executable rt_concat_trafo_out param_id_pool \
-    -f /PATH/TO/filetypes.txt -m /PATH/TO/macros.xml \
-    -s PATH/TO/tools_blacklist.txt
-    ```
-
-
- * As last step you need to change manually the binary names of all external binaries you want to use in OpenMS. Some of these tools might already be deprecated and the files might not exist:
-
-    ```
-    sed -i '13 a\-fido_executable Fido' wrappers/FidoAdapter.xml
-    sed -i '13 a\-fidocp_executable FidoChooseParameters' wrappers/FidoAdapter.xml
-    sed -i '13 a\-myrimatch_executable myrimatch' wrappers/MyriMatchAdapter.xml
-    sed -i '13 a\-omssa_executable omssa' wrappers/OMSSAAdapter.xml
-    sed -i '13 a\-xtandem_executable xtandem' wrappers/XTandemAdapter.xml
-    ```
-    
- * For some tools, additional work has to be done. In `MSGFPlusAdapter.xml` the following is needed in the command section at the beginning (check your file to know what to copy where):
- 
-   ```
-    <command><![CDATA[
-
-    ## check input file type
-    #set $in_type = $param_in.ext
+2. The CI testing framework of OpenMS contains command lines and test data 
+   (https://github.com/OpenMS/OpenMS/tree/develop/src/tests/topp). These tests
+   are described in two CMake files.
 
-    ## create the symlinks to set the proper file extension, since msgf uses them to choose how to handle the input files
-    ln -s '$param_in' 'param_in.${in_type}' &&
-    ln -s '$param_database' param_database.fasta &&
-    ## find location of the MSGFPlus.jar file of the msgf_plus conda package
-    MSGF_JAR=\$(msgf_plus -get_jar_path) &&
+   - From these CMake files Galaxy tests are auto generated and stored in `macros_autotest.xml`
+   - The command lines are stored in `prepare_test_data.sh` for regeneration of test data
+
+More details can be found in the comments of the shell script.
+
+Open problems
+=============
 
-    MSGFPlusAdapter
-    -executable \$MSGF_JAR
-    #if $param_in:
-      -in 'param_in.${in_type}'
-    #end if
-    #if $param_out:
-      -out $param_out
-    #end if
-    #if $param_mzid_out:
-      -mzid_out $param_mzid_out
-    #end if
-    #if $param_database:
-      -database param_database.fasta
-    #end if
-    
-    [...]
-    ]]>
-    ```
- 
- * In Xtandem Converter and probably in others:
- 
-    ```
-    #if str($param_missed_cleavages) != '':
-    ```
-    This is because integers needs to be compared as string otherwise `0` becomes `false`.
- 
- * In `MetaProSIP.xml` add `R` as a requirement:
- 
-   ```
-   <expand macro="requirements">
-       <requirement type="package" version="3.3.1">r-base</requirement>
-   </expand>
-   ```
-   
- * In `IDFileConverter.xml` the following is needed in the command section at the beginning (check your file to know what to copy where):
- 
-   ```
-    <command><![CDATA[
-   
-      ## check input file type
-      #set $in_type = $param_in.ext
+Some tools stall in CI testing using `--biocontainers` which is why the OpenMS
+tools are currently listed in `.tt_biocontainer_skip`. This is
 
-      ## create the symlinks to set the proper file extension, since IDFileConverter uses them to choose how to handle the input files
-      ln -s '$param_in' 'param_in.${in_type}' &&
-
-      IDFileConverter
-
-      #if $param_in:
-        -in 'param_in.${in_type}'
-      #end if
-
-        [...]
-        ]]>
-    ```
+- AssayGeneratorMetabo and SiriusAdapter (both depend on sirius)
+- OMSSAAdapter
 
- * In `IDFileConverter.xml` and `FileConverter.xml` add `auto_format="true"` to the output, e.g.:
- 
-   - `<data name="param_out" auto_format="true"/>`
-   - `<data name="param_out" metadata_source="param_in" auto_format="true"/>`
-        
- * To add an example test case to `DecoyDatabase.xml` add the following after the output section. If standard settings change you might have to adjust the options and/or the test files.
- 
-    ```
-       <tests>
-        <test>
-            <param name="param_in" value="DecoyDatabase_input.fasta"/>
-            <output name="param_out" file="DecoyDatabase_output.fasta"/>
-        </test>
-    </tests>
-    ```
-    
- * Additionally cause of lacking dependencies, the following adapters have been removed in `SKIP_TOOLS_FILES.txt` as well:
-    * OMSSAAdapter
-    * MyrimatchAdapter
-    
- * Additionally cause of a problematic parameter (-model_directory), the following adapter has been removed:
-    * PepNovoAdapter
-
+Using `docker -t` seems to solve the problem (see
+https://github.com/galaxyproject/galaxy/issues/10153).
 
 Licence (MIT)
 =============