Running AlphaFold 3 on Engaging
AlphaFold is an AI system developed by Google that is used for predicting protein structures. Here we provide a brief description of how to run this model on the Engaging computing cluster.
Note
These instructions assume that you have access to a partition with a GPU on Engaging. If you do not have such access, then you may be able to run this on a CPU, but this would require editing the code distribution provided by Google DeepMind.
Getting Started
For simplicity, in this example, we are storing everything except for the AlphaFold dataset in a folder in our home directory on Engaging. We will use this folder as our working directory:
To run AlphaFold 3, we need to obtain a few files that we will store in our working directory:
Model weights
These can be obtained by submitting a request to Google DeepMind. Usually, requests are granted within a few days. To make a request, follow the instructions on the AlphaFold 3 GitHub Repository.
When you get access, you will receive a link to download the parameters. After
you download them, you can upload them to Engaging using scp
on your local
machine (you will receive a Duo push notification - see
Transferring Files):
On Engaging, decompress the file and move to a models
directory:
AlphaFold 3 code
Clone the GitHub repository:
You will need to have an SSH key with GitHub set up on Engaging. If you have not done this, you can clone using the web URL:
Container image
Google DeepMind provides instructions in their repository on running AlphaFold 3 with Docker. Docker is not compatible with most HPC environments, so we need to run a pre-built container using Apptainer. Thankfully, there is one already built on DockerHub. We have converted this to an Apptainer image and saved it globally on Engaging located at:
Running AlphaFold 3
The last thing you will need to run AlphaFold 3 is the AlphaFold dataset.
Because it is quite large, we have saved it globally on Engaging for all users
at /orcd/datasets/001/alphafold3
.
Once you have everything you need, you will be ready to run AlphaFold 3. We will now go through a test case adapted from the AlphaFold 3 GitHub Repository. From the working directory, create an output directory and a test input file:
Copy the following into af_input/fold_input.json
(using vim
, emacs
, or
nano
):
{
"name": "2PV7",
"sequences": [
{
"protein": {
"id": ["A", "B"],
"sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}
You can either run this in an interactive session or in a batch job. If you have access to a partition with a GPU, replace the partition name below as necessary:
Request an interactive session with a GPU:
Run this script (sh run_alphafold.sh
):
#!/bin/bash
module load apptainer
# Enter the path to the AF3 dataset and container image:
DATABASES_DIR=/orcd/datasets/001/alphafold3
IMAGE_PATH=/orcd/software/community/001/container_images/alphafold3/20250311/alphafold3.sif
# Enter the directory of the AF3 material:
WORKDIR=~/af3
apptainer exec \
--bind $WORKDIR/af_input:/root/af_input \
--bind $WORKDIR/af_output:/root/af_output \
--bind $WORKDIR/models:/root/models \
--bind $WORKDIR/alphafold3:/root/alphafold3 \
--bind $DATABASES_DIR:/root/public_databases \
--nv \
$IMAGE_PATH \
python /root/alphafold3/run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--model_dir=/root/models \
--output_dir=/root/af_output \
--db_dir=/root/public_databases
Create your batch job script:
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 16
#SBATCH -p mit_normal_gpu
#SBATCH --gres=gpu:1
module load apptainer
# Enter the path to the AF3 dataset and container image:
DATABASES_DIR=/orcd/datasets/001/alphafold3
IMAGE_PATH=/orcd/software/community/001/container_images/alphafold3/20250311/alphafold3.sif
# Enter the directory of the AF3 material:
WORKDIR=~/af3
apptainer exec \
--bind $WORKDIR/af_input:/root/af_input \
--bind $WORKDIR/af_output:/root/af_output \
--bind $WORKDIR/models:/root/models \
--bind $WORKDIR/alphafold3:/root/alphafold3 \
--bind $DATABASES_DIR:/root/public_databases \
--nv \
$IMAGE_PATH \
python /root/alphafold3/run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--model_dir=/root/models \
--output_dir=/root/af_output \
--db_dir=/root/public_databases
Submit the batch job:
Output is saved to the af_output
directory.