Mercurial > repos > galaxy-australia > alphafold2
diff README.rst @ 1:6c92e000d684 draft
"planemo upload for repository https://github.com/usegalaxy-au/galaxy-local-tools commit a510e97ebd604a5e30b1f16e5031f62074f23e86"
author | galaxy-australia |
---|---|
date | Tue, 01 Mar 2022 02:53:05 +0000 |
parents | |
children | abba603c6ef3 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Tue Mar 01 02:53:05 2022 +0000 @@ -0,0 +1,164 @@ +Alphafold compute setup +======================= + +Overview +-------- + +Alphafold requires a customised compute environment to run. The machine +needs a GPU, and access to a 2.2 Tb reference data store. + +This document is designed to provide details on the compute environment +required for Alphafold operation, and the Galaxy job destination +settings to run the wrapper. + +For full details on Alphafold requirements, see +https://github.com/deepmind/alphafold. + +HARDWARE +~~~~~~~~ + +The machine is recommended to have the following specs: - 12 cores - 80 +Gb RAM - 2.5 Tb storage - A fast Nvidia GPU. + +As a minimum, the Nvidia GPU must have 8Gb RAM. It also requires +**unified memory** to be switched on. Unified memory is usually enabled +by default, but some HPC systems will turn it off so the GPU can be +shared between multiple jobs concurrently. + +ENVIRONMENT +~~~~~~~~~~~ + +This wrapper runs Alphafold as a singularity container. The following +software are needed: + +- `Singularity <https://sylabs.io/guides/3.0/user-guide/installation.html>`_ +- `NVIDIA Container + Toolkit <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html>`_ + +As Alphafold uses an Nvidia GPU, the NVIDIA Container Toolkit is needed. +This makes the GPU available inside the running singularity container. + +To check that everything has been set up correctly, run the following + +:: + + singularity run --nv docker://nvidia/cuda:11.0-base nvidia-smi + +If you can see something similar to this output (details depend on your +GPU), it has been set up correctly. + +:: + + +-----------------------------------------------------------------------------+ + | NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 | + |-------------------------------+----------------------+----------------------+ + | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | + | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | + | | | MIG M. | + |===============================+======================+======================| + | 0 Tesla T4 Off | 00000000:00:05.0 Off | 0 | + | N/A 49C P0 28W / 70W | 0MiB / 15109MiB | 0% Default | + | | | N/A | + +-------------------------------+----------------------+----------------------+ + + +-----------------------------------------------------------------------------+ + | Processes: | + | GPU GI CI PID Type Process name GPU Memory | + | ID ID Usage | + |=============================================================================| + | No running processes found | + +-----------------------------------------------------------------------------+ + +REFERENCE DATA +~~~~~~~~~~~~~~ + +Alphafold needs reference data to run. The wrapper expects this data to +be present at ``/data/alphafold_databases``. To download, run the +following shell script command in the tool directory. + +:: + + # make folders if needed + mkdir /data /data/alphafold_databases + + # download ref data + bash scripts/download_all_data.sh /data/alphafold_databases + +This will install the reference data to ``/data/alphafold_databases``. +To check this has worked, ensure the final folder structure is as +follows: + +:: + + data/alphafold_databases + ├── bfd + │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata + │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex + │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata + │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex + │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata + │ └── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex + ├── mgnify + │ └── mgy_clusters_2018_12.fa + ├── params + │ ├── LICENSE + │ ├── params_model_1.npz + │ ├── params_model_1_ptm.npz + │ ├── params_model_2.npz + │ ├── params_model_2_ptm.npz + │ ├── params_model_3.npz + │ ├── params_model_3_ptm.npz + │ ├── params_model_4.npz + │ ├── params_model_4_ptm.npz + │ ├── params_model_5.npz + │ └── params_model_5_ptm.npz + ├── pdb70 + │ ├── md5sum + │ ├── pdb70_a3m.ffdata + │ ├── pdb70_a3m.ffindex + │ ├── pdb70_clu.tsv + │ ├── pdb70_cs219.ffdata + │ ├── pdb70_cs219.ffindex + │ ├── pdb70_hhm.ffdata + │ ├── pdb70_hhm.ffindex + │ └── pdb_filter.dat + ├── pdb_mmcif + │ ├── mmcif_files + │ └── obsolete.dat + ├── uniclust30 + │ └── uniclust30_2018_08 + └── uniref90 + └── uniref90.fasta + +JOB DESTINATION +~~~~~~~~~~~~~~~ + +Alphafold needs a custom singularity job destination to run. The +destination needs to be configured for singularity, and some extra +singularity params need to be set as seen below. + +Specify the job runner. For example, a local runner + +:: + + <plugin id="alphafold_runner" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner"/> + +Customise the job destination with required singularity settings. The +settings below are mandatory, but you may include other settings as +needed. + +:: + + <destination id="alphafold" runner="alphafold_runner"> + <param id="dependency_resolution">'none'</param> + <param id="singularity_enabled">true</param> + <param id="singularity_run_extra_arguments">--nv</param> + <param id="singularity_volumes">"$job_directory:ro,$tool_directory:ro,$job_directory/outputs:rw,$working_directory:rw,/data/alphafold_databases:/data:ro"</param> + </destination> + +Closing +~~~~~~~ + +If you are experiencing technical issues, feel free to write to +help@genome.edu.au. We may be able to provide advice on setting up +Alphafold on your compute environment.