Megablocks on ROCm installation

Megablocks on ROCm installation#

2025-10-21

4 min read time

Applies to Linux

Megablocks is a lightweight library for mixture-of-experts (MoE) training.

This topic covers setup instructions and the necessary files to build, test, and run Megablocks with ROCm support in a Docker environment. To learn more about Megablocks on ROCm, including its use cases, recommendations, as well as hardware and software compatibility, see Megablocks compatibility.

Note

Megablocks is supported on ROCm 6.3.0.

Install Megablocks#

To install Megablocks on ROCm, you have the following options:

Use a prebuilt Docker image with Megablocks pre-installed (recommended)
Build your own Docker image

Use a prebuilt Docker image with Megablocks pre-installed#

Docker is the recommended method to set up a Megablocks environment, as it avoids potential installation issues. The tested, prebuilt image includes Megablocks, PyTorch, ROCm, and other dependencies.

Pull the Docker image

docker pull rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0

Launch and connect to the container

docker run -it --rm \
--privileged -v ./:/app \
--network=host --device=/dev/kfd \
--device=/dev/dri --group-add video \
--name=my_megablocks --cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--ipc=host --shm-size 16G \
rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0

Build your own Docker image#

Megablocks supports the ROCm platform and can be run directly by setting up a Docker container from scratch. A Dockerfile is provided in the ROCm/megablocks repository to help you get started.

Clone the ROCm/megablocks repository

git clone https://github.com/ROCm/megablocks.git

Enter the directory and build the docker file

cd megablocks
docker build -t rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0

Run the docker container

docker run -it \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0

Set up your datasets#

You can use the gpt2_125m_8gpu.sh script to run the Megablocks training process. If you are working with other model sizes, the process is similar, but you will need to use a different script.

Once you are inside the Megablocks Directory, you can download the BooksCorpus dataset or Oscar dataset by utilizing the helper scripts stored in the dataset directory:

Oscar dataset
gpt2-vocab.json
gpt2-merges.txt

cd third_party/Stanford-Megatron-LM/tools
wget https://huggingface.co/bigscience/misc-test-data/resolve/main/stas/oscar-1GB.jsonl.xz
xz -d oscar-1GB.jsonl.xz
wget -O gpt2-vocab.json https://huggingface.co/openai-community/gpt2/resolve/main/vocab.json
wget -O gpt2-merges.txt https://huggingface.co/mkhalifa/gpt2-biographies/resolve/main/merges.txt

Pre-process the datasets

export BASE_SRC_PATH=$(dirname $(pwd))
export BASE_DATA_PATH=${BASE_SRC_PATH}/tools
python preprocess_data.py \
--input ${BASE_DATA_PATH}/oscar-1GB.jsonl \
--output-prefix ${BASE_DATA_PATH}/my-gpt2 \
--vocab-file ${BASE_DATA_PATH}/gpt2-vocab.json \
--dataset-impl mmap \
--tokenizer-type GPT2BPETokenizer \
--merge-file ${BASE_DATA_PATH}/gpt2-merges.txt \
--append-eod \
--workers 8 \
--chunk-size 10

Set up the pre-training script

Edit the exp/gpt2/gpt2_125m_8gpu.sh script inside the project root directory:

DATA_PATH=${BASE_DATA_PATH}/my-gpt2_text_document
CHECKPOINT_PATH=${BASE_DATA_PATH}/checkpoints
VOCAB_FILE=${BASE_DATA_PATH}/gpt2-vocab.json
MERGE_FILE=${BASE_DATA_PATH}/gpt2-merges.txt

Run the pre-training script

From the project root directory, execute the following command:
```
./exp/gpt2/gpt2_125m_8gpu.sh | tee train.log
```

Test the Megablocks installation#

Megablocks unit tests are optional for validating your installation if you used a prebuilt Docker image from AMD ROCm Docker Hub.

To run unit tests manually and validate your installation fully, follow these steps:

Note

Run the following from the Stanford Megatron-LM root. Running the unit tests requires 8 GPUs as they test for distributed functionality.

pytest -m gpu /root/megablocks/tests/ 2>&1 | tee "megablocks_test.log"

Run a Megablocks example#

Recommended example: pretraining_gpt.sh. For detailed steps, refer to the Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog.