Stanford Megatron-LM on ROCm documentation#
2026-02-26
1 min read time
With Stanford Megatron-LM on ROCm, you can train massive transformer LLMs with data, tensor, and pipeline parallelism on AMD Instinct GPUs, enabling scale-out to hundreds of billions of parameters for enterprise multilingual pretraining and domain adaptation.
Stanford Megatron-LM is a large-scale language model training framework developed by NVIDIA at NVIDIA/Megatron-LM. It is designed to train massive transformer-based language models efficiently by model and data parallelism.
Stanford Megatron-LM on ROCm supports the BERT, GPT, T5, and ICT models, providing efficient tensor, pipeline, and sequence-based model parallelism for pre-training transformer-based language models such as GPT (decoder-only), BERT (encoder-only), and T5 (encoder-decoder). It also offers distributed pre-training, activation checkpointing and recomputation, a distributed optimizer, and mixture-of-experts support.
Stanford Megatron-LM is part of the ROCm-LLMExt toolkit.
The Stanford Megatron-LM public repository is located at ROCm/Stanford-Megatron-LM.
To contribute to the documentation, refer to Contributing to ROCm.
You can find licensing information on the Licensing page.