Megablocks on ROCm#
2025-07-31
4 min read time
Megablocks is a light-weight library for mixture-of-experts (MoE) training. The core of the system is efficient “dropless-MoE” and standard MoE layers. Megablocks is integrated with stanford-futuredata/Megatron-LM, where data and pipeline parallel training of MoEs is supported.
For hardware, software, and third-party framework compatibility between ROCm and Megablocks, see the following resources:
Note
Megablocks is supported on ROCm 6.3.0.
Install Megablocks#
To install Megablocks on ROCm, you have the following options:
Using a prebuilt Docker image with Megablocks pre-installed#
Docker is the recommended method to set up a Megablocks environment, and it avoids potential installation issues. The tested, prebuilt image includes Megablocks, PyTorch, ROCm, and other dependencies.
Pull the Docker image
docker pull rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0
Launch and connect to the container
docker run -it --rm \ --privileged -v ./:/app \ --network=host --device=/dev/kfd \ --device=/dev/dri --group-add video \ --name=my_megablocks --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --ipc=host --shm-size 16G \ rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0
Build your own Docker image#
Megablocks supports the ROCm platform and can be run directly by setting up a Docker container from scratch. A Dockerfile is provided in the ROCm/megablocks repository to help you get started.
Clone the ROCm/megablocks repository
git clone https://github.com/ROCm/megablocks.git
Enter the directory and build the docker file
cd megablocks docker build -t rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0 .
Run the docker container
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0 .
Set up your datasets#
You can use the gpt2_125m_8gpu.sh script to run the Megablocks training process. If you are working with other model sizes, the process is similar, but you will need to use a different script.
Once you are inside the Megablocks Directory, you can download the BooksCorpus dataset or Oscar dataset by utilizing the helper scripts stored in the dataset directory:
Oscar dataset
gpt2-vocab.json
gpt2-merges.txt
cd third_party/Stanford-Megatron-LM/tools wget https://huggingface.co/bigscience/misc-test-data/resolve/main/stas/oscar-1GB.jsonl.xz xz -d oscar-1GB.jsonl.xz wget -O gpt2-vocab.json https://huggingface.co/openai-community/gpt2/resolve/main/vocab.json wget -O gpt2-merges.txt https://huggingface.co/mkhalifa/gpt2-biographies/resolve/main/merges.txt
Pre-process the datasets
export BASE_SRC_PATH=$(dirname $(pwd)) export BASE_DATA_PATH=${BASE_SRC_PATH}/tools python preprocess_data.py \ --input ${BASE_DATA_PATH}/oscar-1GB.jsonl \ --output-prefix ${BASE_DATA_PATH}/my-gpt2 \ --vocab-file ${BASE_DATA_PATH}/gpt2-vocab.json \ --dataset-impl mmap \ --tokenizer-type GPT2BPETokenizer \ --merge-file ${BASE_DATA_PATH}/gpt2-merges.txt \ --append-eod \ --workers 8 \ --chunk-size 10
Set up the pre-training script
Edit the
exp/gpt2/gpt2_125m_8gpu.sh
script inside the project root directory:DATA_PATH=${BASE_DATA_PATH}/my-gpt2_text_document CHECKPOINT_PATH=${BASE_DATA_PATH}/checkpoints VOCAB_FILE=${BASE_DATA_PATH}/gpt2-vocab.json MERGE_FILE=${BASE_DATA_PATH}/gpt2-merges.txt
Run the pre-training script
From the project root directory, execute the following command:
./exp/gpt2/gpt2_125m_8gpu.sh | tee train.log
Test the Megablocks installation#
Megablocks unit tests are optional for validating your installation if you used a prebuilt Docker image from AMD ROCm Docker Hub.
To run unit tests manually and validate your installation fully, follow these steps:
Note
Run the following from the Stanford Megatron-LM root. Running the unit tests requires 8 GPUs as they test for distributed functionality.
pytest -m gpu /root/megablocks/tests/ 2>&1 | tee "megablocks_test.log"
Run a Megablocks example#
Recommended example: pretraining_gpt.sh
.
For detailed steps, refer to the Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog.