Megablocks on ROCm installation#
2025-10-21
4 min read time
Megablocks is a lightweight library for mixture-of-experts (MoE) training.
This topic covers setup instructions and the necessary files to build, test, and run Megablocks with ROCm support in a Docker environment. To learn more about Megablocks on ROCm, including its use cases, recommendations, as well as hardware and software compatibility, see Megablocks compatibility.
Note
Megablocks is supported on ROCm 6.3.0.
Install Megablocks#
To install Megablocks on ROCm, you have the following options:
Use a prebuilt Docker image with Megablocks pre-installed#
Docker is the recommended method to set up a Megablocks environment, as it avoids potential installation issues. The tested, prebuilt image includes Megablocks, PyTorch, ROCm, and other dependencies.
Pull the Docker image
docker pull rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0
Launch and connect to the container
docker run -it --rm \ --privileged -v ./:/app \ --network=host --device=/dev/kfd \ --device=/dev/dri --group-add video \ --name=my_megablocks --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --ipc=host --shm-size 16G \ rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0
Build your own Docker image#
Megablocks supports the ROCm platform and can be run directly by setting up a Docker container from scratch. A Dockerfile is provided in the ROCm/megablocks repository to help you get started.
Clone the ROCm/megablocks repository
git clone https://github.com/ROCm/megablocks.git
Enter the directory and build the docker file
cd megablocks docker build -t rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0
Run the docker container
docker run -it \ --device=/dev/kfd \ --device=/dev/dri \ --group-add video \ rocm/megablocks:megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0
Set up your datasets#
You can use the gpt2_125m_8gpu.sh script to run the Megablocks training process. If you are working with other model sizes, the process is similar, but you will need to use a different script.
Once you are inside the Megablocks Directory, you can download the BooksCorpus dataset or Oscar dataset by utilizing the helper scripts stored in the dataset directory:
Oscar dataset
gpt2-vocab.json
gpt2-merges.txt
cd third_party/Stanford-Megatron-LM/tools wget https://huggingface.co/bigscience/misc-test-data/resolve/main/stas/oscar-1GB.jsonl.xz xz -d oscar-1GB.jsonl.xz wget -O gpt2-vocab.json https://huggingface.co/openai-community/gpt2/resolve/main/vocab.json wget -O gpt2-merges.txt https://huggingface.co/mkhalifa/gpt2-biographies/resolve/main/merges.txt
Pre-process the datasets
export BASE_SRC_PATH=$(dirname $(pwd)) export BASE_DATA_PATH=${BASE_SRC_PATH}/tools python preprocess_data.py \ --input ${BASE_DATA_PATH}/oscar-1GB.jsonl \ --output-prefix ${BASE_DATA_PATH}/my-gpt2 \ --vocab-file ${BASE_DATA_PATH}/gpt2-vocab.json \ --dataset-impl mmap \ --tokenizer-type GPT2BPETokenizer \ --merge-file ${BASE_DATA_PATH}/gpt2-merges.txt \ --append-eod \ --workers 8 \ --chunk-size 10
Set up the pre-training script
Edit the
exp/gpt2/gpt2_125m_8gpu.shscript inside the project root directory:DATA_PATH=${BASE_DATA_PATH}/my-gpt2_text_document CHECKPOINT_PATH=${BASE_DATA_PATH}/checkpoints VOCAB_FILE=${BASE_DATA_PATH}/gpt2-vocab.json MERGE_FILE=${BASE_DATA_PATH}/gpt2-merges.txt
Run the pre-training script
From the project root directory, execute the following command:
./exp/gpt2/gpt2_125m_8gpu.sh | tee train.log
Test the Megablocks installation#
Megablocks unit tests are optional for validating your installation if you used a prebuilt Docker image from AMD ROCm Docker Hub.
To run unit tests manually and validate your installation fully, follow these steps:
Note
Run the following from the Stanford Megatron-LM root. Running the unit tests requires 8 GPUs as they test for distributed functionality.
pytest -m gpu /root/megablocks/tests/ 2>&1 | tee "megablocks_test.log"
Run a Megablocks example#
Recommended example: pretraining_gpt.sh.
For detailed steps, refer to the Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog.