Using hipDF cudf.pandas with HIP managed memory#
2025-11-04
23 min read time
hipDF ports cudf.pandas to provide a pandas-compatible API backed by hipDF so that existing pandas code
can run on AMD GPUs using accelerated DataFrame operations. cudf.pandas allocates unified (managed) memory
by default via hipMallocManaged. On Linux kernels with Heterogeneous Memory Management (HMM) support and on
supported AMD GPUs, managed memory pages can be transparently migrated to the device on GPU page faults.
This topic describes how the cudf.pandas acceleration layer uses HIP unified managed memory,
and how to configure your environment for best performance on AMD GPUs depending on your use case. For more information,
see HIP memory management.
Recommended: Enable page migration with HSA_XNACK=1#
Enabling GPU page-fault retry requires running the workload with the environment variable HSA_XNACK=1. This activates
page migration and typically provides significant performance gains for cudf.pandas-accelerated workloads for datasets
that fit into GPU VRAM and do not cause heavy CPU to GPU paging. Setting export HSA_XNACK=1 is therefore the recommended
and supported default configuration.
Experimental: When to use HSA_XNACK=0#
Warning
HSA_XNACK=0 is not officially supported as it might cause stability issues with some AMD GPU drivers.
This configuration is provided for experimental use only and is not recommended for production workloads.
If HSA_XNACK is unset or set to 0, cudf.pandas will raise an error by default.
To bypass this check for experimental purposes, set:
export CUDF_PANDAS_BYPASS_XNACK_CHECK=1
With HSA_XNACK=0, the managed memory will reside on the host (DRAM) and be accessed by the GPU via
zero-copy,
which can be beneficial in certain scenarios described below. While page migration often improves performance, there are cases
where disabling it may be preferable:
Page thrashing: If your workload frequently oscillates memory pages between CPU and GPU, migration overhead might degrade performance. In such cases, using
HSA_XNACK=0keeps pages resident in host memory and avoids page thrashing.Datasets larger than GPU VRAM: For oversized datasets, performance degradations have been observed with
HSA_XNACK=1due to excessive migration pressure. In such cases, settingHSA_XNACK=0can yield better performance by keeping data in host memory and leveraging zero-copy access.
Summary#
HSA_XNACK=1(Officially Supported):Enables GPU page-fault retry and HMM-based page migration.
Managed memory pages can move to GPU on demand.
Typically fastest for
cudf.pandasacceleration when data fits in device memory and is mainly used on the GPUs.
HSA_XNACK=0or unset (Experimental):Disabled by default, can be enabled experimentally with
CUDF_PANDAS_BYPASS_XNACK_CHECK=1.Disables page-fault retry; pages remain resident in host memory.
GPU accesses host memory via zero-copy.
Can avoid thrashing and may be used for datasets exceeding the available device memory.
Instablities observed with some recent ROCm driver versions.
Not recommended for any production workloads.