Megablocks compatibility

Megablocks compatibility#

2026-01-08

4 min read time

Applies to Linux

Megablocks is a lightweight library for mixture-of-experts (MoE) training. The core of the system is efficient “dropless-MoE” and standard MoE layers. Megablocks is integrated with stanford-futuredata/Megatron-LM, where data and pipeline parallel training of MoEs is supported.

Support overview#

The ROCm-supported version of Megablocks is maintained in the official ROCm/megablocks repository, which differs from the stanford-futuredata/Megatron-LM upstream repository.
To get started and install Megablocks on ROCm, use the prebuilt Docker image, which includes ROCm, Megablocks, and all required dependencies.
- See the ROCm Megablocks installation guide for installation and setup instructions.
- You can also consult the upstream Installation guide for additional context.

Compatibility matrix#

AMD validates and publishes Megablocks images with ROCm backends on Docker Hub. The following Docker image tag and associated inventories represent the latest available Megablocks version from the official Docker Hub. Click to view the image on Docker Hub.

Docker image	ROCm	Megablocks	PyTorch	Ubuntu	Python	GPU
rocm/megablocks	6.3.0	0.7.0	2.4.0	24.04	3.12.9	MI300X

Supported models and features with ROCm 6.3.0#

This section summarizes the Megablocks features supported by ROCm.

Distributed Pre-training
Activation Checkpointing and Recomputation
Distributed Optimizer
Mixture-of-Experts
dropless-Mixture-of-Experts

Use cases and recommendations#

The Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog post guides how to leverage the ROCm platform for pre-training using the Megablocks framework. It introduces a streamlined approach for training Mixture-of-Experts (MoE) models using the Megablocks library on AMD hardware. Focusing on GPT-2, it demonstrates how block-sparse computations can enhance scalability and efficiency in MoE training. The guide provides step-by-step instructions for setting up the environment, including cloning the repository, building the Docker image, and running the training container. Additionally, it offers insights into utilizing the oscar-1GB.json dataset for pre-training language models. By leveraging Megablocks and the ROCm platform, you can optimize your MoE training workflows for large-scale transformer models.

It features how to pre-process datasets and how to begin pre-training on AMD GPUs through:

Single-GPU pre-training
Multi-GPU pre-training