Stanford Megatron-LM on ROCm documentation

Stanford Megatron-LM on ROCm documentation#

2026-02-26

1 min read time

Applies to Linux

With Stanford Megatron-LM on ROCm, you can train massive transformer LLMs with data, tensor, and pipeline parallelism on AMD Instinct GPUs, enabling scale-out to hundreds of billions of parameters for enterprise multilingual pretraining and domain adaptation.

Stanford Megatron-LM is a large-scale language model training framework developed by NVIDIA at NVIDIA/Megatron-LM. It is designed to train massive transformer-based language models efficiently by model and data parallelism.

Stanford Megatron-LM on ROCm supports the BERT, GPT, T5, and ICT models, providing efficient tensor, pipeline, and sequence-based model parallelism for pre-training transformer-based language models such as GPT (decoder-only), BERT (encoder-only), and T5 (encoder-decoder). It also offers distributed pre-training, activation checkpointing and recomputation, a distributed optimizer, and mixture-of-experts support.

Stanford Megatron-LM is part of the ROCm-LLMExt toolkit.

The Stanford Megatron-LM public repository is located at ROCm/Stanford-Megatron-LM.

To contribute to the documentation, refer to Contributing to ROCm.

You can find licensing information on the Licensing page.