# HG changeset patch # User iuc # Date 1670063844 0 # Node ID 27a6a256cd23069dbc690c74a68e6dcba42274ec # Parent 52b6a4d98009da64b438f459b2b33738e121c8e3 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_gemini_database_downloader commit 275b7863ff4f8b0dff9cd7ea6c4b635694f0168d diff -r 52b6a4d98009 -r 27a6a256cd23 data_manager/data_manager_gemini_download.py --- a/data_manager/data_manager_gemini_download.py Sun Nov 22 12:49:35 2020 +0000 +++ b/data_manager/data_manager_gemini_download.py Sat Dec 03 10:37:24 2022 +0000 @@ -1,4 +1,6 @@ -#!/usr/bin/env python +#!/usr/bin/env python2 + +# IMPORTANT: This will run using Python 2 still! import datetime import json @@ -14,6 +16,11 @@ yaml.dump(config, fo, allow_unicode=False, default_flow_style=False) +def load_gemini_config(config_file): + with open(config_file) as fi: + return yaml.load(fi) + + def main(): today = datetime.date.today() with open(sys.argv[1]) as fh: @@ -21,36 +28,7 @@ target_directory = params['output_data'][0]['extra_files_path'] os.mkdir(target_directory) - # Generate a minimal configuration file for GEMINI update - # to instruct the tool to download the annotation data into a - # subfolder of the target directory. - config_file = os.path.join(target_directory, 'gemini-config.yaml') - anno_dir = os.path.join(target_directory, 'gemini/data') - gemini_bootstrap_config = {'annotation_dir': anno_dir} - write_gemini_config(gemini_bootstrap_config, config_file) - - # Now gemini update can be called to download the data. - # The GEMINI_CONFIG environment variable lets the tool discover - # the configuration file we prepared for it. - # Note that the tool will rewrite the file turning it into a - # complete gemini configuration file. - gemini_env = os.environ.copy() - gemini_env['GEMINI_CONFIG'] = target_directory - cmd = "gemini update --dataonly %s %s" % ( - params['param_dict']['gerp_bp'], - params['param_dict']['cadd'] - ) - subprocess.check_call(cmd, shell=True, env=gemini_env) - - # GEMINI tool wrappers that need access to the annotation files - # are supposed to symlink them into a gemini/data subfolder of - # the job working directory. To have GEMINI discover them there, - # we need to set this location as the 'annotation_dir' in the - # configuration file. - with open(config_file) as fi: - config = yaml.load(fi) - config['annotation_dir'] = 'gemini/data' - write_gemini_config(config, config_file) + # Prepare the metadata for the new data table record # The name of the database should reflect whether it was built with or # without the optional GERP-bp data, the CADD scores, or both. @@ -65,7 +43,6 @@ else: anno_desc = '' - # Finally, we prepare the metadata for the new data table record ... data_manager_dict = { 'data_tables': { 'gemini_versioned_databases': [ @@ -83,10 +60,49 @@ } } - # ... and save it to the json results file + # Save the data table metadata to the json results file with open(sys.argv[1], 'w') as fh: json.dump(data_manager_dict, fh, sort_keys=True) + # Generate a minimal configuration file for GEMINI update + # to instruct the tool to download the annotation data into a + # subfolder of the target directory. + config_file = os.path.join(target_directory, 'gemini-config.yaml') + anno_dir = os.path.join(target_directory, 'gemini/data') + gemini_bootstrap_config = {'annotation_dir': anno_dir} + write_gemini_config(gemini_bootstrap_config, config_file) + + # Verify that we can read the config_file just created as we need to do so + # after the data download has finished and it is very annoying to have this + # fail after dozens of Gbs of data have been downloaded + config = load_gemini_config(config_file) + + # Now gemini update can be called to download the data. + # The GEMINI_CONFIG environment variable lets the tool discover + # the configuration file we prepared for it. + # Note that the tool will rewrite the file turning it into a + # complete gemini configuration file. + gemini_env = os.environ.copy() + gemini_env['GEMINI_CONFIG'] = target_directory + cmd = ['gemini', 'update', '--dataonly'] + if params['param_dict']['gerp_bp']: + cmd += ['--extra', 'gerp_bp'] + if params['param_dict']['cadd']: + cmd += ['--extra', 'cadd_score'] + + if not params['param_dict']['test_data_manager']: + # This is not a test => Going to embark on a massive download now + subprocess.check_call(cmd, env=gemini_env) + + # GEMINI tool wrappers that need access to the annotation files + # are supposed to symlink them into a gemini/data subfolder of + # the job working directory. To have GEMINI discover them there, + # we need to set this location as the 'annotation_dir' in the + # configuration file. + config = load_gemini_config(config_file) + config['annotation_dir'] = 'gemini/data' + write_gemini_config(config, config_file) + if __name__ == "__main__": main() diff -r 52b6a4d98009 -r 27a6a256cd23 data_manager/data_manager_gemini_download.xml --- a/data_manager/data_manager_gemini_download.xml Sun Nov 22 12:49:35 2020 +0000 +++ b/data_manager/data_manager_gemini_download.xml Sat Dec 03 10:37:24 2022 +0000 @@ -1,4 +1,4 @@ - + the annotation files required by the GEMINI suite of tools 0.20.1 @@ -11,16 +11,32 @@ python '$__tool_directory__/data_manager_gemini_download.py' '$out_file' - - + + + + + + + + + + + + + + + + This tool downloads the GEMINI annotation files and makes them available to diff -r 52b6a4d98009 -r 27a6a256cd23 test-data/gemini_versioned_databases.loc --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/gemini_versioned_databases.loc Sat Dec 03 10:37:24 2022 +0000 @@ -0,0 +1,3 @@ +## GEMINI versioned databases +#DownloadDate dbkey DBversion Description +#2018-07-08 hg19 181 GEMINI annotations (2018-07-08 snapshot) diff -r 52b6a4d98009 -r 27a6a256cd23 test-data/test.json --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/test.json Sat Dec 03 10:37:24 2022 +0000 @@ -0,0 +1,1 @@ +\{"data_tables": \{"gemini_versioned_databases": \[\{"dbkey": "hg19", "name": "GEMINI annotations \(.+ snapshot\)", "path": "./.+", "value": ".+", "version": "200"\}\]\}\} diff -r 52b6a4d98009 -r 27a6a256cd23 tool_data_table_conf.xml.test --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_data_table_conf.xml.test Sat Dec 03 10:37:24 2022 +0000 @@ -0,0 +1,7 @@ + + + value, dbkey, version, name, path + +
+
+