Skip to content

Commit e3231de

Browse files
DeepMindcopybara-github
authored andcommitted
Update of README.md
PiperOrigin-RevId: 501279176 Change-Id: I9cf92212322b29691844973ded9e337e81b3a9fd
1 parent 8f1ebd5 commit e3231de

1 file changed

Lines changed: 129 additions & 73 deletions

File tree

README.md

Lines changed: 129 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# AlphaFold
44

55
This package provides an implementation of the inference pipeline of AlphaFold
6-
v2.0. For simplicity, we refer to this model as AlphaFold throughout the rest of
6+
v2. For simplicity, we refer to this model as AlphaFold throughout the rest of
77
this document.
88

99
We also provide:
@@ -36,21 +36,51 @@ If you have any questions, please contact the AlphaFold team at
3636

3737
![CASP14 predictions](imgs/casp14_predictions.gif)
3838

39-
## First time setup
39+
## Installation and running your first prediction
4040

4141
You will need a machine running Linux, AlphaFold does not support other
42-
operating systems.
42+
operating systems. Full installation requires up to 3 TB of disk space to keep
43+
genetic databases (SSD storage is recommended) and a modern NVIDIA GPU (GPUs
44+
with more memory can predict larger protein structures).
4345

44-
The following steps are required in order to run AlphaFold:
46+
Please follow these steps:
4547

4648
1. Install [Docker](https://www.docker.com/).
4749
* Install
4850
[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
4951
for GPU support.
5052
* Setup running
5153
[Docker as a non-root user](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user).
52-
1. Download genetic databases (see below).
53-
1. Download model parameters (see below).
54+
55+
1. Clone this repository and `cd` into it.
56+
57+
```bash
58+
git clone https://github.com/deepmind/alphafold.git
59+
cd ./alphafold
60+
```
61+
62+
1. Download genetic databases and model parameters:
63+
64+
* Install `aria2c` (on most Linux distributions it is available via the
65+
package manager).
66+
67+
* Please use the script `scripts/download_all_data.sh` to download
68+
and set up full databases. This may take substantial time (download size is
69+
556 GB), so we recommend running this script in the background:
70+
71+
```bash
72+
scripts/download_all_data.sh <DOWNLOAD_DIR> > download.log 2> download_all.log &
73+
```
74+
75+
* **Note: The download directory `<DOWNLOAD_DIR>` should *not* be a
76+
subdirectory in the AlphaFold repository directory.** If it is, the Docker
77+
build will be slow as the large databases will be copied into the docker
78+
build context.
79+
80+
* It is possible to run AlphaFold with reduced databases; please refer to
81+
the [complete documentation](#genetic-databases).
82+
83+
5484
1. Check that AlphaFold will be able to use a GPU by running:
5585

5686
```bash
@@ -63,10 +93,58 @@ The following steps are required in order to run AlphaFold:
6393
or take a look at the following
6494
[NVIDIA Docker issue](https://github.com/NVIDIA/nvidia-docker/issues/1447#issuecomment-801479573).
6595
66-
If you wish to run AlphaFold using Singularity (a common containerization
67-
platform on HPC systems) we recommend using some of the third party Singularity
68-
setups as linked in https://github.com/deepmind/alphafold/issues/10 or
69-
https://github.com/deepmind/alphafold/issues/24.
96+
If you wish to run AlphaFold using Singularity (a common containerization
97+
platform on HPC systems) we recommend using some of the third party Singularity
98+
setups as linked in https://github.com/deepmind/alphafold/issues/10 or
99+
https://github.com/deepmind/alphafold/issues/24.
100+
101+
1. Build the Docker image:
102+
103+
```bash
104+
docker build -f docker/Dockerfile -t alphafold .
105+
```
106+
107+
If you encounter the following error:
108+
109+
```
110+
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
111+
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
112+
```
113+
114+
use the workaround described in
115+
https://github.com/deepmind/alphafold/issues/463#issuecomment-1124881779.
116+
117+
1. Install the `run_docker.py` dependencies. Note: You may optionally wish to
118+
create a
119+
[Python Virtual Environment](https://docs.python.org/3/tutorial/venv.html)
120+
to prevent conflicts with your system's Python environment.
121+
122+
```bash
123+
pip3 install -r docker/requirements.txt
124+
```
125+
126+
1. Make sure that the output directory exists (the default is `/tmp/alphafold`)
127+
and that you have sufficient permissions to write into it.
128+
129+
1. Run `run_docker.py` pointing to a FASTA file containing the protein
130+
sequence(s) for which you wish to predict the structure (`--fasta_paths`
131+
parameter). AlphaFold will search for the available templates before the
132+
date specified by the `--max_template_date` parameter; this could be used to
133+
avoid certain templates during modeling. `--data_dir` is the directory with
134+
downloaded genetic databases and `--output_dir` is the absolute path to the
135+
output directory.
136+
137+
```bash
138+
python3 docker/run_docker.py \
139+
--fasta_paths=your_protein.fasta \
140+
--max_template_date=2022-01-01 \
141+
--data_dir=$DOWNLOAD_DIR \
142+
--output_dir=/home/user/absolute_path_to_the_output_dir
143+
```
144+
145+
1. Once the run is over, the output directory shall contain predicted
146+
structures of the target protein. Please check the documentation below for
147+
additional options and troubleshooting tips.
70148
71149
### Genetic databases
72150
@@ -86,22 +164,24 @@ AlphaFold needs multiple genetic (sequence) databases to run:
86164
We provide a script `scripts/download_all_data.sh` that can be used to download
87165
and set up all of these databases:
88166
89-
* Default:
167+
* Recommended default:
90168
91169
```bash
92170
scripts/download_all_data.sh <DOWNLOAD_DIR>
93171
```
94172
95173
will download the full databases.
96174
97-
* With `reduced_dbs`:
175+
* With `reduced_dbs` parameter:
98176
99177
```bash
100178
scripts/download_all_data.sh <DOWNLOAD_DIR> reduced_dbs
101179
```
102180
103181
will download a reduced version of the databases to be used with the
104-
`reduced_dbs` database preset.
182+
`reduced_dbs` database preset. This shall be used with the corresponding
183+
AlphaFold parameter `--db_preset=reduced_dbs` later during the AlphaFold run
184+
(please see [AlphaFold parameters](#running-alphafold) section).
105185
106186
:ledger: **Note: The download directory `<DOWNLOAD_DIR>` should *not* be a
107187
subdirectory in the AlphaFold repository directory.** If it is, the Docker build
@@ -111,7 +191,7 @@ We don't provide exactly the database versions used in CASP14 – see the
111191
[note on reproducibility](#note-on-casp14-reproducibility). Some of the
112192
databases are mirrored for speed, see [mirrored databases](#mirrored-databases).
113193
114-
:ledger: **Note: The total download size for the full databases is around 415 GB
194+
:ledger: **Note: The total download size for the full databases is around 556 GB
115195
and the total size when unzipped is 2.62 TB. Please make sure you have a large
116196
enough hard drive space, bandwidth and time to download. We recommend using an
117197
SSD for better genetic search performance.**
@@ -230,58 +310,11 @@ To use the deprecated v2.1.0 AlphaFold-Multimer model weights:
230310
**The simplest way to run AlphaFold is using the provided Docker script.** This
231311
was tested on Google Cloud with a machine using the `nvidia-gpu-cloud-image`
232312
with 12 vCPUs, 85 GB of RAM, a 100 GB boot disk, the databases on an additional
233-
3 TB disk, and an A100 GPU.
234-
235-
1. Clone this repository and `cd` into it.
313+
3 TB disk, and an A100 GPU. For your first run, please follow the instructions
314+
from [Installation and running your first prediction](#installation-and-running-your-first-prediction)
315+
section.
236316
237-
```bash
238-
git clone https://github.com/deepmind/alphafold.git
239-
```
240-
241-
1. Build the Docker image:
242-
243-
```bash
244-
docker build -f docker/Dockerfile -t alphafold .
245-
```
246-
247-
If you encounter the following error:
248-
249-
```
250-
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
251-
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
252-
```
253-
254-
use the workaround described in
255-
https://github.com/deepmind/alphafold/issues/463#issuecomment-1124881779.
256-
257-
1. Install the `run_docker.py` dependencies. Note: You may optionally wish to
258-
create a
259-
[Python Virtual Environment](https://docs.python.org/3/tutorial/venv.html)
260-
to prevent conflicts with your system's Python environment.
261-
262-
```bash
263-
pip3 install -r docker/requirements.txt
264-
```
265-
266-
1. Make sure that the output directory exists (the default is `/tmp/alphafold`)
267-
and that you have sufficient permissions to write into it.
268-
269-
1. Run `run_docker.py` pointing to a FASTA file containing the protein
270-
sequence(s) for which you wish to predict the structure. If you are
271-
predicting the structure of a protein that is already in PDB and you wish to
272-
avoid using it as a template, then `max_template_date` must be set to be
273-
before the release date of the structure. You must also provide the path to
274-
the directory containing the downloaded databases. For example, for the
275-
T1050 CASP14 target:
276-
277-
```bash
278-
python3 docker/run_docker.py \
279-
--fasta_paths=T1050.fasta \
280-
--max_template_date=2020-05-14 \
281-
--data_dir=$DOWNLOAD_DIR
282-
```
283-
284-
By default, Alphafold will attempt to use all visible GPU devices. To use a
317+
1. By default, Alphafold will attempt to use all visible GPU devices. To use a
285318
subset, specify a comma-separated list of GPU UUID(s) or index(es) using the
286319
`--gpu_devices` flag. See
287320
[GPU enumeration](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#gpu-enumeration)
@@ -325,9 +358,24 @@ with 12 vCPUs, 85 GB of RAM, a 100 GB boot disk, the databases on an additional
325358
--max_template_date=2020-05-14 \
326359
--model_preset=monomer \
327360
--db_preset=reduced_dbs \
328-
--data_dir=$DOWNLOAD_DIR
361+
--data_dir=$DOWNLOAD_DIR \
362+
--output_dir=/home/user/absolute_path_to_the_output_dir
329363
```
330364
365+
1. After generating the predicted model, by default AlphaFold runs a relaxation
366+
step to improve geometrical quality. You can control this via `--run_relax=true`
367+
(default) or `--run_relax=false`.
368+
369+
1. The relaxation step can be run on GPU (faster, but could be less stable) or
370+
CPU (slow, but stable). This can be controlled with `--enable_gpu_relax=true`
371+
(default) or `--enable_gpu_relax=false`.
372+
373+
1. AlphaFold can re-use MSAs (multiple sequence alignments) for the same
374+
sequence via `--use_precomputed_msas=true` option; this can be useful for
375+
trying different AlphaFold parameters. This option assumes that the
376+
directory structure generated by the first AlphaFold run in the output
377+
directory exists and that the protein sequence is the same.
378+
331379
### Running AlphaFold-Multimer
332380
333381
All steps are the same as when running the monomer system, but you will have to
@@ -342,7 +390,8 @@ python3 docker/run_docker.py \
342390
--fasta_paths=multimer.fasta \
343391
--max_template_date=2020-05-14 \
344392
--model_preset=multimer \
345-
--data_dir=$DOWNLOAD_DIR
393+
--data_dir=$DOWNLOAD_DIR \
394+
--output_dir=/home/user/absolute_path_to_the_output_dir
346395
```
347396
348397
By default the multimer system will run 5 seeds per model (25 total predictions)
@@ -402,7 +451,8 @@ python3 docker/run_docker.py \
402451
--fasta_paths=monomer.fasta \
403452
--max_template_date=2021-11-01 \
404453
--model_preset=monomer \
405-
--data_dir=$DOWNLOAD_DIR
454+
--data_dir=$DOWNLOAD_DIR \
455+
--output_dir=/home/user/absolute_path_to_the_output_dir
406456
```
407457
408458
#### Folding a homomer
@@ -426,7 +476,8 @@ python3 docker/run_docker.py \
426476
--fasta_paths=homomer.fasta \
427477
--max_template_date=2021-11-01 \
428478
--model_preset=multimer \
429-
--data_dir=$DOWNLOAD_DIR
479+
--data_dir=$DOWNLOAD_DIR \
480+
--output_dir=/home/user/absolute_path_to_the_output_dir
430481
```
431482
432483
#### Folding a heteromer
@@ -454,7 +505,8 @@ python3 docker/run_docker.py \
454505
--fasta_paths=heteromer.fasta \
455506
--max_template_date=2021-11-01 \
456507
--model_preset=multimer \
457-
--data_dir=$DOWNLOAD_DIR
508+
--data_dir=$DOWNLOAD_DIR \
509+
--output_dir=/home/user/absolute_path_to_the_output_dir
458510
```
459511
460512
#### Folding multiple monomers one after another
@@ -468,7 +520,8 @@ python3 docker/run_docker.py \
468520
--fasta_paths=monomer1.fasta,monomer2.fasta \
469521
--max_template_date=2021-11-01 \
470522
--model_preset=monomer \
471-
--data_dir=$DOWNLOAD_DIR
523+
--data_dir=$DOWNLOAD_DIR \
524+
--output_dir=/home/user/absolute_path_to_the_output_dir
472525
```
473526
474527
#### Folding multiple multimers one after another
@@ -482,7 +535,8 @@ python3 docker/run_docker.py \
482535
--fasta_paths=multimer1.fasta,multimer2.fasta \
483536
--max_template_date=2021-11-01 \
484537
--model_preset=multimer \
485-
--data_dir=$DOWNLOAD_DIR
538+
--data_dir=$DOWNLOAD_DIR \
539+
--output_dir=/home/user/absolute_path_to_the_output_dir
486540
```
487541
488542
### AlphaFold output
@@ -633,6 +687,7 @@ If you use the code or data in this package, please cite:
633687
634688
In addition, if you use the AlphaFold-Multimer mode, please cite:
635689
690+
636691
```bibtex
637692
@article {AlphaFold-Multimer2021,
638693
author = {Evans, Richard and O{\textquoteright}Neill, Michael and Pritzel, Alexander and Antropova, Natasha and Senior, Andrew and Green, Tim and {\v{Z}}{\'\i}dek, Augustin and Bates, Russ and Blackwell, Sam and Yim, Jason and Ronneberger, Olaf and Bodenstein, Sebastian and Zielinski, Michal and Bridgland, Alex and Potapenko, Anna and Cowie, Andrew and Tunyasuvunakool, Kathryn and Jain, Rishub and Clancy, Ellen and Kohli, Pushmeet and Jumper, John and Hassabis, Demis},
@@ -754,4 +809,5 @@ reference to the following:
754809
(unmodified), by Mitchell AL et al., available free of all copyright
755810
restrictions and made fully and freely available for both non-commercial and
756811
commercial use under
757-
[CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/).
812+
[CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/).
813+

0 commit comments

Comments
 (0)