Megablocks compatibility#
2025-10-21
4 min read time
Megablocks is a lightweight library for mixture-of-experts (MoE) training. The core of the system is efficient “dropless-MoE” and standard MoE layers. Megablocks is integrated with stanford-futuredata/Megatron-LM, where data and pipeline parallel training of MoEs is supported.
Support overview#
The ROCm-supported version of Megablocks is maintained in the official ROCm/megablocks repository, which differs from the stanford-futuredata/Megatron-LM upstream repository.
To get started and install Megablocks on ROCm, use the prebuilt Docker image, which includes ROCm, Megablocks, and all required dependencies.
See the ROCm Megablocks installation guide for installation and setup instructions.
You can also consult the upstream Installation guide for additional context.
Version support#
Megablocks is supported on ROCm 6.3.0.
Supported devices#
Officially Supported: AMD Instinct™ MI300X
Partially Supported (functionality or performance limitations): AMD Instinct™ MI250X, MI210
Supported models and features#
This section summarizes the Megablocks features supported by ROCm.
Distributed Pre-training
Activation Checkpointing and Recomputation
Distributed Optimizer
Mixture-of-Experts
dropless-Mixture-of-Experts
Use cases and recommendations#
The Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog post guides how to leverage the ROCm platform for pre-training using the Megablocks framework. It introduces a streamlined approach for training Mixture-of-Experts (MoE) models using the Megablocks library on AMD hardware. Focusing on GPT-2, it demonstrates how block-sparse computations can enhance scalability and efficiency in MoE training. The guide provides step-by-step instructions for setting up the environment, including cloning the repository, building the Docker image, and running the training container. Additionally, it offers insights into utilizing the
oscar-1GB.jsondataset for pre-training language models. By leveraging Megablocks and the ROCm platform, you can optimize your MoE training workflows for large-scale transformer models.
It features how to pre-process datasets and how to begin pre-training on AMD GPUs through:
Single-GPU pre-training
Multi-GPU pre-training
Docker image compatibility#
AMD validates and publishes Megablocks images with ROCm backends on Docker Hub. The following Docker image tag and associated inventories represent the latest available Megablocks version from the official Docker Hub. Click to view the image on Docker Hub.
Docker image |
ROCm |
Megablocks |
PyTorch |
Ubuntu |
Python |
|---|---|---|---|---|---|
| rocm/megablocks | 24.04 |