CUDA® Python Interoperability#

Applies to Linux

Advanced Micro Devices, Inc.

2023-06-23

20 min read

This chapter discusses HIP Python’s CUDA® Python interoperability layer that is shipped in a separate package with the name hip-python-as-cuda. In particular, we discuss how to run existing CUDA Python code on AMD GPUs, and if localized modifications are required, how to detect HIP Python and how to fall back to the underlying HIP Python Python and Cython modules. Moreover, a technique named “enum constant hallucination” is presented that allows HIP Python “invent” enum constants and their non-conflicting value on-the-fly for enum error types.

Note

All examples in this chapter have been tested with ROCm™ 5.4.3 on Ubuntu 22. The License applies to all examples in this chapter.

Installation#

HIP Python’s CUDA interoperability layer comes in a separate Python 3 package with the name hip-python-as-cuda. Its sole dependency is the hip-python package with the exact same version number.

After having identified the correct package for your ROCm™ installation, type:

python3 -m pip install hip-python-as-cuda-<hip_version>.<hip_python_version>

or, if you have a HIP Python wheel somewhere in your filesystem, type:

python3 -m pip install <path/to/hip_python_as_cuda>.whl

Note

The first option will only be available after the public release on PyPI.

Note

See HIP Python Versioning for more details on the hip-python and hip-python-as-cuda version number.

Basic Usage (Python)#

What will I learn?

How I can use HIP Python’s CUDA Python interoperability modules in my Python code.

Note

Most links in this tutorial to the CUDA Python interoperability layer API are broken. Until we find a way to index the respective Python modules, you must unfortunately use the search function for CUDA Python interoperability layer symbols.

After installing the HIP Python package hip-python-as-cuda, you can import the individual modules that you need as shown below:

Listing 15 Importing HIP Python CUDA Interop Modules#

from cuda import cuda
from cuda import cudart
from cuda import nvrtc

Note

When writing this documentation, only Python and Cython modules for the libraries cuda (CUDA Driver), cudart (CUDA runtime), and nvrtc (NVRTC) were shipped by CUDA Python. Therefore, HIP Python only provides interoperability modules for them and no other CUDA library.

Python Example#

What will I learn?

How I can run simple CUDA Python applications directly on AMD GPUs via HIP Python.

After installing the HIP Python package hip-python-as-cuda, you can run the below example directly on AMD GPUs. There is nothing else to do. This works because all CUDA Python functions, types and even enum constants are aliases of HIP objects.

See

cudaError_t, cudaError_t, cudaStreamCreate, cudaMemcpyAsync, cudaMemsetAsync,cudaStreamSynchronize,cudaStreamDestroy,cudaFree

Listing 16 CUDA Python Example#

import ctypes
import random
import array

from cuda import cuda

def cuda_check(call_result):
    err = call_result[0]
    result = call_result[1:]
    if len(result) == 1:
        result = result[0]
    if isinstance(err, cuda.cudaError_t) and err != cuda.cudaError_t.cudaSuccess:
        raise RuntimeError(str(err))
    return result

# inputs
n = 100
x_h = array.array("i",[int(random.random()*10) for i in range(0,n)])
num_bytes = x_h.itemsize * len(x_h)
x_d = cuda_check(cuda.cudaMalloc(num_bytes))

stream = cuda_check(cuda.cudaStreamCreate())
cuda_check(cuda.cudaMemcpyAsync(x_d,x_h,num_bytes,cuda.cudaMemcpyKind.cudaMemcpyHostToDevice,stream))
cuda_check(cuda.cudaMemsetAsync(x_d,0,num_bytes,stream))
cuda_check(cuda.cudaMemcpyAsync(x_h,x_d,num_bytes,cuda.cudaMemcpyKind.cudaMemcpyDeviceToHost,stream))
cuda_check(cuda.cudaStreamSynchronize(stream))
cuda_check(cuda.cudaStreamDestroy(stream))

# deallocate device data 
cuda_check(cuda.cudaFree(x_d))

for i,x in enumerate(x_h):
    if x != 0:
        raise ValueError(f"expected '0' for element {i}, is: '{x}'")
print("ok")

What is happening?

See HIP Streams for an explanation of a similar HIP program’s steps.

Enum Constant Hallucination#

What will I learn?

How I can let HIP Python’s enum error types in the CUDA Python interoperability layer “invent” values for undefined enum constants (that do not conflict with the values of the defined constants).

We use the example below to demonstrate how you can deal with scenarios where a CUDA Python program, which we want to run on AMD GPUs, performs an error check that involves enum constants that are not relevant for HIP programs and/or AMD GPUs. As HIP Python’s routines will never return these enum constants, it is safe to generate values for them on the fly. Such behavior can be enabled selectively for CUDA Python interoperability layer enums — either via the respective environment variable HIP_PYTHON_{myenumtype}_HALLUCINATE and/or at runtime via the module variable with the same name in cuda, cudart, or nvtrc.

The example below fails because there are no HIP analogues to the following constants:

cudaError_t.cudaErrorStartupFailure
cudaError_t.cudaError_t.cudaErrorNotPermitted
cudaError_t.cudaErrorSystemNotReady
cudaError_t.cudaErrorSystemDriverMismatch
cudaError_t.cudaErrorCompatNotSupportedOnDevice
cudaError_t.cudaErrorTimeout
cudaError_t.cudaErrorApiFailureBase

However, the example will run successfully if you set the environment variable HIP_PYTHON_cudaError_t_HALLUCINATE to 1, yes, y, or true (case does not matter). Alternatively, you could set the module variable cuda.cudart.HIP_PYTHON_cudaError_t_HALLUCINATE to True; see HIP Python-Specific Code Modifications on different ways to detect HIP Python in order to introduce such a modification to your code.

Listing 17 CUDA Python Enum Constant Hallucination#

from cuda.cudart import cudaError_t

error_kinds = ( # some of those do not exist in HIP
    cudaError_t.cudaErrorInitializationError,
    cudaError_t.cudaErrorInsufficientDriver,
    cudaError_t.cudaErrorInvalidDeviceFunction,
    cudaError_t.cudaErrorInvalidDevice,
    cudaError_t.cudaErrorStartupFailure, # no HIP equivalent
    cudaError_t.cudaErrorInvalidKernelImage,
    cudaError_t.cudaErrorAlreadyAcquired,
    cudaError_t.cudaErrorOperatingSystem,
    cudaError_t.cudaErrorNotPermitted, # no HIP equivalent
    cudaError_t.cudaErrorNotSupported,
    cudaError_t.cudaErrorSystemNotReady, # no HIP equivalent
    cudaError_t.cudaErrorSystemDriverMismatch, # no HIP equivalent
    cudaError_t.cudaErrorCompatNotSupportedOnDevice, # no HIP equivalent
    cudaError_t.cudaErrorDeviceUninitialized,
    cudaError_t.cudaErrorTimeout, # no HIP equivalent
    cudaError_t.cudaErrorUnknown,
    cudaError_t.cudaErrorApiFailureBase, # no HIP equivalent
)

for err in error_kinds:
    assert isinstance(err,cudaError_t)
    assert (err != cudaError_t.cudaSuccess)
print("ok")

Caution

Enum constant hallucination should only be used for error return values and not for enum constants that are passed as argument to one of the CUDA Python interoperability layer’s functions.

Basic Usage (Cython)#

What will I learn?

How I can use the CUDA Python interoperability layer’s Cython and Python modules in my code.

You can import the Python objects that you need into your *.pyx file as shown below:

Listing 18 Importing HIP Python Modules into Cython *.pyx file#

from cuda import cuda # enum types, enum aliases, fields
from cuda import nvrtc
# ...

In the same file, you can also or alternatively cimport the cdef entities as shown below:

Listing 19 Importing HIP Python Cython declaration files (*.pxd) into a Cython *.pxd or *.pyx file#

from cuda cimport ccuda   # direct access to C interfaces and lazy function loaders
from cuda cimport ccudart
from cuda cimport cnvrtc
...

from cuda cimport cuda # access to `cdef class` and `ctypedef` types
                       # that have been created per C struct/union/typedef
from cuda cimport cudart
from cuda cimport nvrtc
 # ...

Cython Example#

What will I learn?

That I can port CUDA Python Cython code to AMD GPUs with minor modifications.
How I can introduce different compilation paths for HIP Python’s CUDA interoperability layer and CUDA Python.

The example below shows a CUDA Python example that can be compiled for and run on AMD GPUs. To do so, it is necessary to define the compiler flag HIP_Python from within the setup.py script. (We will discuss how to do so in short.) This will replace the qualified C++-like enum constant expression ccudart.cudaError_t.cudaSuccess by the C-like expression ccudart.cudaSuccess.

In the example, the DEF statement and the IF and ELSE statements are Cython compile time definitions and conditional statements, respectively.

Listing 20 CUDA Python Cython Program#

cimport cuda.ccudart as ccudart

cdef ccudart.cudaError_t err
cdef ccudart.cudaStream_t stream
DEF num_bytes = 4*100
cdef char[num_bytes] x_h
cdef void* x_d
cdef int x

def cuda_check(ccudart.cudaError_t err):
    IF HIP_PYTHON: # HIP Python CUDA interop layer Cython interfaces are used like C API
        success_status = ccudart.cudaSuccess
    ELSE:
        success_status = ccudart.cudaError_t.cudaSuccess
    if err != success_status:
        raise RuntimeError(f"reason: {err}")

IF HIP_PYTHON:
    print("using HIP Python wrapper for CUDA Python")

cuda_check(ccudart.cudaStreamCreate(&stream))
cuda_check(ccudart.cudaMalloc(&x_d, num_bytes))
cuda_check(ccudart.cudaMemcpyAsync(x_d,x_h, num_bytes, ccudart.cudaMemcpyHostToDevice, stream))
cuda_check(ccudart.cudaMemsetAsync(x_d, 0, num_bytes, stream))
cuda_check(ccudart.cudaMemcpyAsync(x_h, x_d, num_bytes, ccudart.cudaMemcpyDeviceToHost, stream))
cuda_check(ccudart.cudaStreamSynchronize(stream))
cuda_check(ccudart.cudaStreamDestroy(stream))

# deallocate device data
cuda_check(ccudart.cudaFree(x_d))

for i in range(0,round(num_bytes/4)):
    x = (<int*>&x_h[4*i])[0]
    if x != 0:
        raise ValueError(f"expected '0' for element {i}, is: '{x}'")
print("ok")

What is happening?

See HIP Streams for an explanation of a similar HIP Python program’s steps.

The example can be compiled for AMD GPUs via the following setup.py script, which specifies compile_time_env=dict(HIP_PYTHON=True) as keyword parameter of the cythonize call in line

Listing 21 Setup Script#

import os

from setuptools import Extension, setup
from Cython.Build import cythonize

ROCM_PATH=os.environ.get("ROCM_PATH", "/opt/rocm")
HIP_PLATFORM = os.environ.get("HIP_PLATFORM", "amd")

if HIP_PLATFORM not in ("amd", "hcc"):
    raise RuntimeError("Currently only HIP_PLATFORM=amd is supported")

def create_extension(name, sources):
    global ROCM_PATH
    global HIP_PLATFORM
    rocm_inc = os.path.join(ROCM_PATH,"include")
    rocm_lib_dir = os.path.join(ROCM_PATH,"lib")
    platform = HIP_PLATFORM.upper()
    cflags = ["-D", f"__HIP_PLATFORM_{platform}__"]
 
    return Extension(
        name,
        sources=sources,
        include_dirs=[rocm_inc],
        library_dirs=[rocm_lib_dir],
        language="c",
        extra_compile_args=cflags,
    )

setup(
  ext_modules = cythonize(
    [create_extension("ccuda_stream", ["ccuda_stream.pyx"]),],
    compiler_directives=dict(language_level=3),
    compile_time_env=dict(HIP_PYTHON=True),
  )
)

For your convenience, you can use the Makefile below to build a Cython module in-place (via make build) and run the code (by importing the module via make run).

Listing 22 Makefile#

PYTHON ?= python3

.PHONY: build run clean

build:
	$(PYTHON) setup.py build_ext --inplace
run: build
	$(PYTHON) -c "import ccuda_stream"
clean:
	rm -rf *.so *.c build/

HIP Python-Specific Code Modifications#

What will I learn?

That I can use HIP objects (via member variables) when importing the CUDA Python interoperability layer’s Python modules.
That I can access HIP enum constants also via their CUDA interoperability layer type.
That I can directly use HIP definitions too when cimporting the CUDA Python interoperability layer’s Cython modules.

In scenarios where the HIP Python Python or Cython code will need to diverge from the original CUDA Python code, e.g. due to differences in a signature, we can directly access the underlying HIP Python Python modules from the CUDA interoperability layer’s Python modules as shown in the example below.

Listing 23 Various ways to determine if we are working with HIP Python’s CUDA Python interoperability layer in Python code.#

from cuda import cuda # or cudart, or nvrtc
# [...]
if "HIP_PYTHON" in cuda:
   # do something (with cuda.hip.<...> or cuda.hip_python_mod.<...>)
if "hip" in cuda: # or "hiprtc" for nvrtc
   # do something with cuda.hip.<...> (or cuda.hip_python_mod.<...>)
if hasattr(cuda,"hip"): # or "hiprtc" for nvrtc
   # do something with cuda.hip.<...> (or cuda.hip_python_mod.<...>)
if "hip_python_mod" in cuda:
   # do something with cuda.hip_python_mod.<...> (or cuda.hip.<...>) # or nvrtc.<...> for nvrtc
if hasattr(cuda,"hip_python_mod"):
   # do something with cuda.hip_python_mod.<...> (or cuda.hip.<...>) # or nvrtc.<...> for nvrtc

Moreover, the interoperability layer’s Python enum types also contain all the enum constants of their HIP analogue as shown in the snippet below.

Listing 24 Python enum class in cuda.pyx#

# [...]
class CUmemorytype(hip._hipMemoryType__Base,metaclass=_CUmemorytype_EnumMeta):
   hipMemoryTypeHost = hip.chip.hipMemoryTypeHost
   CU_MEMORYTYPE_HOST = hip.chip.hipMemoryTypeHost
   cudaMemoryTypeHost = hip.chip.hipMemoryTypeHost
   hipMemoryTypeDevice = hip.chip.hipMemoryTypeDevice
   CU_MEMORYTYPE_DEVICE = hip.chip.hipMemoryTypeDevice
   cudaMemoryTypeDevice = hip.chip.hipMemoryTypeDevice
   hipMemoryTypeArray = hip.chip.hipMemoryTypeArray
   CU_MEMORYTYPE_ARRAY = hip.chip.hipMemoryTypeArray
   hipMemoryTypeUnified = hip.chip.hipMemoryTypeUnified
   CU_MEMORYTYPE_UNIFIED = hip.chip.hipMemoryTypeUnified
   hipMemoryTypeManaged = hip.chip.hipMemoryTypeManaged
   cudaMemoryTypeManaged = hip.chip.hipMemoryTypeManaged
# [...]

In the c-prefixed Cython declaration files (cuda.ccuda.pxd, cuda.ccudart.pxd, and cuda.cnvrtc.pxd), you will further find that the HIP functions and union/struct types are directly included too:

Listing 25 Excerpt from ccuda.pxd#

# [...]
from hip.chip cimport hipDeviceProp_t
from hip.chip cimport hipDeviceProp_t as cudaDeviceProp
# [...]
from hip.chip cimport hipMemcpy
from hip.chip cimport hipMemcpy as cudaMemcpy
# [...]

In the Cython declaration files without c-prefix (cuda.cuda.pxd, cuda.cudart.pxd, and cuda.nvrtc.pxd), you will discover that the original HIP types (only those derived from unions and structs) are c-imported too and that the CUDA interoperability layer types are made subclasses of the respective HIP type; see the example below. This allows to pass them to the CUDA interoperability layer’s Python functions, i.e., the aliased HIP Python functions.

Listing 26 Excerpt from cuda.pxd#

# [...]
from hip.hip cimport hipKernelNodeParams # here
cdef class CUDA_KERNEL_NODE_PARAMS(hip.hip.hipKernelNodeParams):
   pass
cdef class CUDA_KERNEL_NODE_PARAMS_st(hip.hip.hipKernelNodeParams):
   pass
cdef class CUDA_KERNEL_NODE_PARAMS_v1(hip.hip.hipKernelNodeParams):
   pass
cdef class cudaKernelNodeParams(hip.hip.hipKernelNodeParams):
   pass
# [...]

CUDA® Python Interoperability

Contents

CUDA® Python Interoperability#

Installation#

Basic Usage (Python)#

Python Example#

Enum Constant Hallucination#

Basic Usage (Cython)#

Cython Example#

HIP Python-Specific Code Modifications#