JAX compatibility

JAX compatibility#

2025-10-21

9 min read time

Applies to Linux

JAX is a library for array-oriented numerical computation (similar to NumPy), with automatic differentiation and just-in-time (JIT) compilation to enable high-performance machine learning research.

JAX provides an API that combines automatic differentiation and the Accelerated Linear Algebra (XLA) compiler to achieve high-performance machine learning at scale. JAX uses composable transformations of Python and NumPy through JIT compilation, automatic vectorization, and parallelization.

Support overview#

The ROCm-supported version of JAX is maintained in the official ROCm/rocm-jax repository, which differs from the jax-ml/jax upstream repository.
To get started and install JAX on ROCm, use the prebuilt Docker images, which include ROCm, JAX, and all required dependencies.
- See the ROCm JAX installation guide for installation and setup instructions.
- You can also consult the upstream Installation guide for additional context.

Version support#

AMD releases official ROCm JAX Docker images quarterly alongside new ROCm releases. These images undergo full AMD testing. Community ROCm JAX Docker images follow upstream JAX releases and use the latest available ROCm version.

Use cases and recommendations#

The nanoGPT in JAX blog explores the implementation and training of a Generative Pre-trained Transformer (GPT) model in JAX, inspired by Andrej Karpathy’s JAX-based nanoGPT. Comparing how essential GPT components—such as self-attention mechanisms and optimizers—are realized in JAX and JAX, also highlights JAX’s unique features.
The Optimize GPT Training: Enabling Mixed Precision Training in JAX using ROCm on AMD GPUs blog post provides a comprehensive guide on enhancing the training efficiency of GPT models by implementing mixed precision techniques in JAX, specifically tailored for AMD GPUs utilizing the ROCm platform.
The Supercharging JAX with Triton Kernels on AMD GPUs blog demonstrates how to develop a custom fused dropout-activation kernel for matrices using Triton, integrate it with JAX, and benchmark its performance using ROCm.
The Distributed fine-tuning with JAX on AMD GPUs outlines the process of fine-tuning a Bidirectional Encoder Representations from Transformers (BERT)-based large language model (LLM) using JAX for a text classification task. The blog post discusses techniques for parallelizing the fine-tuning across multiple AMD GPUs and assess the model’s performance on a holdout dataset. During the fine-tuning, a BERT-base-cased transformer model and the General Language Understanding Evaluation (GLUE) benchmark dataset was used on a multi-GPU setup.
The MI300X workload optimization guide provides detailed guidance on optimizing workloads for the AMD Instinct MI300X GPU using ROCm. The page is aimed at helping users achieve optimal performance for deep learning and other high-performance computing tasks on the MI300X GPU.

For more use cases and recommendations, see ROCm JAX blog posts.

Docker image compatibility#

AMD validates and publishes JAX images with ROCm backends on Docker Hub.

For jax-community images, see rocm/jax-community on Docker Hub.

To find the right image tag, see the JAX on ROCm installation documentation for a list of available rocm/jax images.

Key ROCm libraries for JAX#

The following ROCm libraries represent potential targets that could be utilized by JAX on ROCm for various computational tasks. The actual libraries used will depend on the specific implementation and operations performed.

ROCm library	Version	Purpose
hipBLAS	3.0.2	Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for matrix and vector operations.
hipBLASLt	1.0.0	hipBLASLt is an extension of hipBLAS, providing additional features like epilogues fused into the matrix multiplication kernel or use of integer tensor cores.
hipCUB	4.0.0	Provides a C++ template library for parallel algorithms for reduction, scan, sort and select.
hipFFT	1.0.20	Provides GPU-accelerated Fast Fourier Transform (FFT) operations.
hipRAND	3.0.0	Provides fast random number generation for GPUs.
hipSOLVER	3.0.0	Provides GPU-accelerated solvers for linear systems, eigenvalues, and singular value decompositions (SVD).
hipSPARSE	4.0.1	Accelerates operations on sparse matrices, such as sparse matrix-vector or matrix-matrix products.
hipSPARSELt	0.2.4	Accelerates operations on sparse matrices, such as sparse matrix-vector or matrix-matrix products.
MIOpen	3.5.0	Optimized for deep learning primitives such as convolutions, pooling, normalization, and activation functions.
RCCL	2.26.6	Optimized for multi-GPU communication for operations like all-reduce, broadcast, and scatter.
rocThrust	4.0.0	Provides a C++ template library for parallel algorithms like sorting, reduction, and scanning.

Note

This table shows ROCm libraries that could potentially be utilized by JAX. Not all libraries may be used in every configuration, and the actual library usage will depend on the specific operations and implementation details.

Supported data types and modules#

The following tables lists the supported public JAX API data types and modules.

Supported data types#

ROCm supports all the JAX data types of jax.dtypes module, jax.numpy.dtype and default_dtype . The ROCm supported data types in JAX are collected in the following table.

Data type	Description
`bfloat16`	16-bit bfloat (brain floating point).
`bool`	Boolean.
`complex128`	128-bit complex.
`complex64`	64-bit complex.
`float16`	16-bit (half precision) floating-point.
`float32`	32-bit (single precision) floating-point.
`float64`	64-bit (double precision) floating-point.
`half`	16-bit (half precision) floating-point.
`int16`	Signed 16-bit integer.
`int32`	Signed 32-bit integer.
`int64`	Signed 64-bit integer.
`int8`	Signed 8-bit integer.
`uint16`	Unsigned 16-bit (word) integer.
`uint32`	Unsigned 32-bit (dword) integer.
`uint64`	Unsigned 64-bit (qword) integer.
`uint8`	Unsigned 8-bit (byte) integer.

Note

JAX data type support is affected by the Key ROCm libraries for JAX and it’s collected on ROCm data types and precision support page.

Supported modules#

For a complete and up-to-date list of JAX public modules (for example, jax.numpy, jax.scipy, jax.lax), their descriptions, and usage, please refer directly to the official JAX API documentation.

Note

Since version 0.1.56, JAX has full support for ROCm, and the Known issues and important notes section contains details about limitations specific to the ROCm backend. The list of JAX API modules are maintained by the JAX project and is subject to change. Refer to the official Jax documentation for the most up-to-date information.

Key features and enhancements for ROCm 7.0#

Upgraded XLA backend: Integrates a newer XLA version, enabling better optimizations, broader operator support, and potential performance gains.
RNN support: Native RNN support (including LSTMs via jax.experimental.rnn) now available on ROCm, aiding sequence model development.
Comprehensive linear algebra capabilities: Offers robust jax.linalg operations, essential for scientific and machine learning tasks.
Expanded AMD GPU architecture support: Provides ongoing support for gfx1101 GPUs and introduces support for gfx950 and gfx12xx GPUs.
Mixed FP8 precision support: Enables lax.dot_general operations with mixed FP8 types, offering pathways for memory and compute efficiency.
Streamlined PyPi packaging: Provides reliable PyPi wheels for JAX on ROCm, simplifying the installation process.
Pallas experimental kernel development: Continued Pallas framework enhancements for custom GPU kernels, including new intrinsics (specific kernel behaviors under review).
Improved build system and CI: Enhanced ROCm build system and CI for greater reliability and maintainability.
Enhanced distributed computing setup: Improved JAX setup in multi-GPU distributed environments.

Known issues and notes for ROCm 7.0#

nn.dot_product_attention: Certain configurations of jax.nn.dot_product_attention may cause segmentation faults, though the majority of use cases work correctly.
SVD with dynamic shapes: SVD on inputs with dynamic/symbolic shapes might result in an error. SVD with static shapes is unaffected.
QR decomposition with symbolic shapes: QR decomposition operations may fail when using symbolic/dynamic shapes in shape polymorphic contexts.
Pallas kernels: Specific advanced Pallas kernels may exhibit variations in numerical output or resource usage. These are actively reviewed as part of Pallas’s experimental development.

JAX compatibility

Contents

JAX compatibility#

Support overview#

Version support#

Use cases and recommendations#

Docker image compatibility#

Key ROCm libraries for JAX#

Supported data types and modules#

Supported data types#

Supported modules#

Key features and enhancements for ROCm 7.0#

Known issues and notes for ROCm 7.0#