Mercurial > repos > galaxy-australia > alphafold2
comparison README.rst @ 14:d00e15139065 draft
planemo upload for repository https://github.com/usegalaxy-au/tools-au commit d490defa32d9c318137d2d781243b392cb14110d-dirty
author | galaxy-australia |
---|---|
date | Tue, 28 Feb 2023 01:15:42 +0000 |
parents | 7fbec959cf2b |
children | f9eb041c518c |
comparison
equal
deleted
inserted
replaced
13:c0e71cb2bd1b | 14:d00e15139065 |
---|---|
73 | 73 |
74 REFERENCE DATA | 74 REFERENCE DATA |
75 ~~~~~~~~~~~~~~ | 75 ~~~~~~~~~~~~~~ |
76 | 76 |
77 Alphafold needs reference data to run. The wrapper expects this data to | 77 Alphafold needs reference data to run. The wrapper expects this data to |
78 be present at ``/data/alphafold_databases``. A custom DB root can be read from | 78 be present at ``/data/alphafold_databases``. A custom path will be read from |
79 the ALPHAFOLD_DB environment variable, if set. To download the AlphaFold, | 79 the ``ALPHAFOLD_DB`` environment variable, if set. |
80 reference data, run the following shell script command in the tool directory. | 80 |
81 | 81 To download the AlphaFold reference DBs: |
82 :: | 82 |
83 | 83 :: |
84 # Set databases root | 84 |
85 ALPHAFOLD_DB_ROOT=/data/alphafold_databases | 85 # Set your AlphaFold DB path |
86 | 86 ALPHAFOLD_DB=/data/alphafold_databases |
87 # make folders if needed | 87 |
88 mkdir -p $ALPHAFOLD_DB_ROOT | 88 # Set your target AlphaFold version |
89 | 89 ALPHAFOLD_VERSION= # e.g. 2.1.2 |
90 # download ref data | 90 |
91 bash scripts/download_all_data.sh $ALPHAFOLD_DB_ROOT | 91 # Download repo |
92 | 92 wget https://github.com/deepmind/alphafold/releases/tag/v${ALPHAFOLD_VERSION}.tar.gz |
93 This will install the reference data to ``/data/alphafold_databases``. | 93 tar xzf v${ALPHAFOLD_VERSION}.tar.gz |
94 | |
95 # Ensure dirs | |
96 mkdir -p $ALPHAFOLD_DB | |
97 | |
98 # Download | |
99 bash alphafold*/scripts/download_all_data.sh $ALPHAFOLD_DB | |
100 | |
101 You will most likely want to run this as a background job, as it will take a | |
102 very long time (7+ days in Australia). | |
103 | |
104 This will install the reference data to your ``$ALPHAFOLD_DB``. | |
94 To check this has worked, ensure the final folder structure is as | 105 To check this has worked, ensure the final folder structure is as |
95 follows: | 106 follows: |
96 | 107 |
97 :: | 108 :: |
109 | |
110 # NOTE: this structure will change between minor AlphaFold versions | |
111 # The tree shown below was updated for v2.3.1 | |
98 | 112 |
99 data/alphafold_databases | 113 data/alphafold_databases |
100 ├── bfd | 114 ├── bfd |
101 │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata | 115 │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata |
102 │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex | 116 │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex |
103 │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata | 117 │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata |
104 │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex | 118 │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex |
105 │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata | 119 │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata |
106 │ └── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex | 120 │ └── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex |
107 ├── mgnify | 121 ├── mgnify |
108 │ └── mgy_clusters_2018_12.fa | 122 │ └── mgy_clusters_2022_05.fa |
109 ├── params | 123 ├── params |
110 │ ├── LICENSE | 124 │ ├── LICENSE |
111 │ ├── params_model_1.npz | 125 │ ├── params_model_1.npz |
126 │ ├── params_model_1_multimer_v3.npz | |
112 │ ├── params_model_1_ptm.npz | 127 │ ├── params_model_1_ptm.npz |
113 │ ├── params_model_2.npz | 128 │ ├── params_model_2.npz |
129 │ ├── params_model_2_multimer_v3.npz | |
114 │ ├── params_model_2_ptm.npz | 130 │ ├── params_model_2_ptm.npz |
115 │ ├── params_model_3.npz | 131 │ ├── params_model_3.npz |
132 │ ├── params_model_3_multimer_v3.npz | |
116 │ ├── params_model_3_ptm.npz | 133 │ ├── params_model_3_ptm.npz |
117 │ ├── params_model_4.npz | 134 │ ├── params_model_4.npz |
135 │ ├── params_model_4_multimer_v3.npz | |
118 │ ├── params_model_4_ptm.npz | 136 │ ├── params_model_4_ptm.npz |
119 │ ├── params_model_5.npz | 137 │ ├── params_model_5.npz |
138 │ ├── params_model_5_multimer_v3.npz | |
120 │ └── params_model_5_ptm.npz | 139 │ └── params_model_5_ptm.npz |
121 ├── pdb70 | 140 ├── pdb70 |
122 │ ├── md5sum | 141 │ ├── md5sum |
123 │ ├── pdb70_a3m.ffdata | 142 │ ├── pdb70_a3m.ffdata |
124 │ ├── pdb70_a3m.ffindex | 143 │ ├── pdb70_a3m.ffindex |
129 │ ├── pdb70_hhm.ffindex | 148 │ ├── pdb70_hhm.ffindex |
130 │ └── pdb_filter.dat | 149 │ └── pdb_filter.dat |
131 ├── pdb_mmcif | 150 ├── pdb_mmcif |
132 │ ├── mmcif_files | 151 │ ├── mmcif_files |
133 │ └── obsolete.dat | 152 │ └── obsolete.dat |
134 ├── uniclust30 | 153 ├── pdb_seqres |
135 │ └── uniclust30_2018_08 | 154 │ └── pdb_seqres.txt |
155 ├── uniprot | |
156 │ └── uniprot.fasta | |
157 ├── uniref30 | |
158 │ ├── UniRef30_2021_03.md5sums | |
159 │ ├── UniRef30_2021_03_a3m.ffdata | |
160 │ ├── UniRef30_2021_03_a3m.ffindex | |
161 │ ├── UniRef30_2021_03_cs219.ffdata | |
162 │ ├── UniRef30_2021_03_cs219.ffindex | |
163 │ ├── UniRef30_2021_03_hhm.ffdata | |
164 │ └── UniRef30_2021_03_hhm.ffindex | |
136 └── uniref90 | 165 └── uniref90 |
137 └── uniref90.fasta | 166 └── uniref90.fasta |
138 | 167 |
139 In more recent releases of the AlphaFold tool, you will need to download an | 168 In more recent releases of the AlphaFold tool, you will need to download an |
140 additional file to allow the ``reduced_dbs`` option: | 169 additional file to allow the ``reduced_dbs`` option: |
141 | 170 |
142 :: | 171 :: |
149 | 178 |
150 data/alphafold_databases | 179 data/alphafold_databases |
151 ├── small_bfd | 180 ├── small_bfd |
152 │ └── bfd-first_non_consensus_sequences.fasta | 181 │ └── bfd-first_non_consensus_sequences.fasta |
153 | 182 |
183 | |
184 **Upgrading database versions** | |
185 | |
186 When upgrading to a new minor version of AlphaFold, you will most likely have to | |
187 upgrade the reference database. This can be a pain, due to the size of the | |
188 databases and the obscurity around what has changed. The simplest way to do | |
189 this is simply create a new directory and download the DBs from scratch. | |
190 However, you can save a considerable amount of time by downloading only the | |
191 components that have changed. | |
192 | |
193 If you wish to continue hosting prior versions of the tool, you must maintain | |
194 the reference DBs for each version. The ``ALPHAFOLD_DB`` environment variable | |
195 must then be set respectively for each tool version in your job conf (on Galaxy | |
196 AU this is currently `configured with TPV<https://github.com/usegalaxy-au/infrastructure/blob/master/files/galaxy/dynamic_job_rules/production/total_perspective_vortex/tools.yml#L1515-L1554>`_). | |
197 | |
198 To minimize redundancy between DB version, we have symlinked the database | |
199 components that are unchanging between versions. In ``v2.1.2 -> v2.3.1`` the BFD | |
200 database is the only component that is persistent, but they are by far the | |
201 largest on disk. | |
154 | 202 |
155 | 203 |
156 JOB DESTINATION | 204 JOB DESTINATION |
157 ~~~~~~~~~~~~~~~ | 205 ~~~~~~~~~~~~~~~ |
158 | 206 |