CUDA® Python Interoperability#

2023-06-23

20 min read

Applies to Linux

Advanced Micro Devices, Inc.

This chapter discusses HIP Python’s CUDA® Python interoperability layer that is shipped in a separate package with the name hip-python-as-cuda. In particular, we discuss how to run existing CUDA Python code on AMD GPUs, and if localized modifications are required, how to detect HIP Python and how to fall back to the underlying HIP Python Python and Cython modules. Moreover, a technique named “enum constant hallucination” is presented that allows HIP Python “invent” enum constants and their non-conflicting value on-the-fly for enum error types.

Note

All examples in this chapter have been tested with ROCm™ 5.4.3 on Ubuntu 22. The License applies to all examples in this chapter.

Installation#

HIP Python’s CUDA interoperability layer comes in a separate Python 3 package with the name hip-python-as-cuda. Its sole dependency is the hip-python package with the exact same version number.

After having identified the correct package for your ROCm™ installation, type:

python3 -m pip install hip-python-as-cuda-<hip_version>.<hip_python_version>

or, if you have a HIP Python wheel somewhere in your filesystem, type:

python3 -m pip install <path/to/hip_python_as_cuda>.whl

Note

The first option will only be available after the public release on PyPI.

Note

See HIP Python Versioning for more details on the hip-python and hip-python-as-cuda version number.

Basic Usage (Python)#

What will I learn?

  • How I can use HIP Python’s CUDA Python interoperability modules in my Python code.

Note

Most links in this tutorial to the CUDA Python interoperability layer API are broken. Until we find a way to index the respective Python modules, you must unfortunately use the search function for CUDA Python interoperability layer symbols.

After installing the HIP Python package hip-python-as-cuda, you can import the individual modules that you need as shown below:

Listing 15 Importing HIP Python CUDA Interop Modules#
1from cuda import cuda
2from cuda import cudart
3from cuda import nvrtc

Note

When writing this documentation, only Python and Cython modules for the libraries cuda (CUDA Driver), cudart (CUDA runtime), and nvrtc (NVRTC) were shipped by CUDA Python. Therefore, HIP Python only provides interoperability modules for them and no other CUDA library.

Python Example#

What will I learn?

How I can run simple CUDA Python applications directly on AMD GPUs via HIP Python.

After installing the HIP Python package hip-python-as-cuda, you can run the below example directly on AMD GPUs. There is nothing else to do. This works because all CUDA Python functions, types and even enum constants are aliases of HIP objects.

See

cudaError_t, cudaError_t, cudaStreamCreate, cudaMemcpyAsync, cudaMemsetAsync,cudaStreamSynchronize,cudaStreamDestroy,cudaFree

Listing 16 CUDA Python Example#
 1import ctypes
 2import random
 3import array
 4
 5from cuda import cuda
 6
 7def cuda_check(call_result):
 8    err = call_result[0]
 9    result = call_result[1:]
10    if len(result) == 1:
11        result = result[0]
12    if isinstance(err, cuda.cudaError_t) and err != cuda.cudaError_t.cudaSuccess:
13        raise RuntimeError(str(err))
14    return result
15
16# inputs
17n = 100
18x_h = array.array("i",[int(random.random()*10) for i in range(0,n)])
19num_bytes = x_h.itemsize * len(x_h)
20x_d = cuda_check(cuda.cudaMalloc(num_bytes))
21
22stream = cuda_check(cuda.cudaStreamCreate())
23cuda_check(cuda.cudaMemcpyAsync(x_d,x_h,num_bytes,cuda.cudaMemcpyKind.cudaMemcpyHostToDevice,stream))
24cuda_check(cuda.cudaMemsetAsync(x_d,0,num_bytes,stream))
25cuda_check(cuda.cudaMemcpyAsync(x_h,x_d,num_bytes,cuda.cudaMemcpyKind.cudaMemcpyDeviceToHost,stream))
26cuda_check(cuda.cudaStreamSynchronize(stream))
27cuda_check(cuda.cudaStreamDestroy(stream))
28
29# deallocate device data 
30cuda_check(cuda.cudaFree(x_d))
31
32for i,x in enumerate(x_h):
33    if x != 0:
34        raise ValueError(f"expected '0' for element {i}, is: '{x}'")
35print("ok")

What is happening?

See HIP Streams for an explanation of a similar HIP program’s steps.

Enum Constant Hallucination#

What will I learn?

  • How I can let HIP Python’s enum error types in the CUDA Python interoperability layer “invent” values for undefined enum constants (that do not conflict with the values of the defined constants).

We use the example below to demonstrate how you can deal with scenarios where a CUDA Python program, which we want to run on AMD GPUs, performs an error check that involves enum constants that are not relevant for HIP programs and/or AMD GPUs. As HIP Python’s routines will never return these enum constants, it is safe to generate values for them on the fly. Such behavior can be enabled selectively for CUDA Python interoperability layer enums — either via the respective environment variable HIP_PYTHON_{myenumtype}_HALLUCINATE and/or at runtime via the module variable with the same name in cuda, cudart, or nvtrc.

The example below fails because there are no HIP analogues to the following constants:

  • cudaError_t.cudaErrorStartupFailure

  • cudaError_t.cudaError_t.cudaErrorNotPermitted

  • cudaError_t.cudaErrorSystemNotReady

  • cudaError_t.cudaErrorSystemDriverMismatch

  • cudaError_t.cudaErrorCompatNotSupportedOnDevice

  • cudaError_t.cudaErrorTimeout

  • cudaError_t.cudaErrorApiFailureBase

However, the example will run successfully if you set the environment variable HIP_PYTHON_cudaError_t_HALLUCINATE to 1, yes, y, or true (case does not matter). Alternatively, you could set the module variable cuda.cudart.HIP_PYTHON_cudaError_t_HALLUCINATE to True; see HIP Python-Specific Code Modifications on different ways to detect HIP Python in order to introduce such a modification to your code.

Listing 17 CUDA Python Enum Constant Hallucination#
 1from cuda.cudart import cudaError_t
 2
 3error_kinds = ( # some of those do not exist in HIP
 4    cudaError_t.cudaErrorInitializationError,
 5    cudaError_t.cudaErrorInsufficientDriver,
 6    cudaError_t.cudaErrorInvalidDeviceFunction,
 7    cudaError_t.cudaErrorInvalidDevice,
 8    cudaError_t.cudaErrorStartupFailure, # no HIP equivalent
 9    cudaError_t.cudaErrorInvalidKernelImage,
10    cudaError_t.cudaErrorAlreadyAcquired,
11    cudaError_t.cudaErrorOperatingSystem,
12    cudaError_t.cudaErrorNotPermitted, # no HIP equivalent
13    cudaError_t.cudaErrorNotSupported,
14    cudaError_t.cudaErrorSystemNotReady, # no HIP equivalent
15    cudaError_t.cudaErrorSystemDriverMismatch, # no HIP equivalent
16    cudaError_t.cudaErrorCompatNotSupportedOnDevice, # no HIP equivalent
17    cudaError_t.cudaErrorDeviceUninitialized,
18    cudaError_t.cudaErrorTimeout, # no HIP equivalent
19    cudaError_t.cudaErrorUnknown,
20    cudaError_t.cudaErrorApiFailureBase, # no HIP equivalent
21)
22
23for err in error_kinds:
24    assert isinstance(err,cudaError_t)
25    assert (err != cudaError_t.cudaSuccess)
26print("ok")

Caution

Enum constant hallucination should only be used for error return values and not for enum constants that are passed as argument to one of the CUDA Python interoperability layer’s functions.

Basic Usage (Cython)#

What will I learn?

  • How I can use the CUDA Python interoperability layer’s Cython and Python modules in my code.

You can import the Python objects that you need into your *.pyx file as shown below:

Listing 18 Importing HIP Python Modules into Cython *.pyx file#
1from cuda import cuda # enum types, enum aliases, fields
2from cuda import nvrtc
3# ...

In the same file, you can also or alternatively cimport the cdef entities as shown below:

Listing 19 Importing HIP Python Cython declaration files (*.pxd) into a Cython *.pxd or *.pyx file#
 1from cuda cimport ccuda   # direct access to C interfaces and lazy function loaders
 2from cuda cimport ccudart
 3from cuda cimport cnvrtc
 4...
 5
 6from cuda cimport cuda # access to `cdef class` and `ctypedef` types
 7                       # that have been created per C struct/union/typedef
 8from cuda cimport cudart
 9from cuda cimport nvrtc
10 # ...

Cython Example#

What will I learn?

  • That I can port CUDA Python Cython code to AMD GPUs with minor modifications.

  • How I can introduce different compilation paths for HIP Python’s CUDA interoperability layer and CUDA Python.

The example below shows a CUDA Python example that can be compiled for and run on AMD GPUs. To do so, it is necessary to define the compiler flag HIP_Python from within the setup.py script. (We will discuss how to do so in short.) This will replace the qualified C++-like enum constant expression ccudart.cudaError_t.cudaSuccess by the C-like expression ccudart.cudaSuccess.

In the example, the DEF statement and the IF and ELSE statements are Cython compile time definitions and conditional statements, respectively.

Listing 20 CUDA Python Cython Program#
 1cimport cuda.ccudart as ccudart
 2
 3cdef ccudart.cudaError_t err
 4cdef ccudart.cudaStream_t stream
 5DEF num_bytes = 4*100
 6cdef char[num_bytes] x_h
 7cdef void* x_d
 8cdef int x
 9
10def cuda_check(ccudart.cudaError_t err):
11    IF HIP_PYTHON: # HIP Python CUDA interop layer Cython interfaces are used like C API
12        success_status = ccudart.cudaSuccess
13    ELSE:
14        success_status = ccudart.cudaError_t.cudaSuccess
15    if err != success_status:
16        raise RuntimeError(f"reason: {err}")
17
18IF HIP_PYTHON:
19    print("using HIP Python wrapper for CUDA Python")
20
21cuda_check(ccudart.cudaStreamCreate(&stream))
22cuda_check(ccudart.cudaMalloc(&x_d, num_bytes))
23cuda_check(ccudart.cudaMemcpyAsync(x_d,x_h, num_bytes, ccudart.cudaMemcpyHostToDevice, stream))
24cuda_check(ccudart.cudaMemsetAsync(x_d, 0, num_bytes, stream))
25cuda_check(ccudart.cudaMemcpyAsync(x_h, x_d, num_bytes, ccudart.cudaMemcpyDeviceToHost, stream))
26cuda_check(ccudart.cudaStreamSynchronize(stream))
27cuda_check(ccudart.cudaStreamDestroy(stream))
28
29# deallocate device data
30cuda_check(ccudart.cudaFree(x_d))
31
32for i in range(0,round(num_bytes/4)):
33    x = (<int*>&x_h[4*i])[0]
34    if x != 0:
35        raise ValueError(f"expected '0' for element {i}, is: '{x}'")
36print("ok")

What is happening?

See HIP Streams for an explanation of a similar HIP Python program’s steps.

The example can be compiled for AMD GPUs via the following setup.py script, which specifies compile_time_env=dict(HIP_PYTHON=True) as keyword parameter of the cythonize call in line

Listing 21 Setup Script#
 1import os
 2
 3from setuptools import Extension, setup
 4from Cython.Build import cythonize
 5
 6ROCM_PATH=os.environ.get("ROCM_PATH", "/opt/rocm")
 7HIP_PLATFORM = os.environ.get("HIP_PLATFORM", "amd")
 8
 9if HIP_PLATFORM not in ("amd", "hcc"):
10    raise RuntimeError("Currently only HIP_PLATFORM=amd is supported")
11
12def create_extension(name, sources):
13    global ROCM_PATH
14    global HIP_PLATFORM
15    rocm_inc = os.path.join(ROCM_PATH,"include")
16    rocm_lib_dir = os.path.join(ROCM_PATH,"lib")
17    platform = HIP_PLATFORM.upper()
18    cflags = ["-D", f"__HIP_PLATFORM_{platform}__"]
19 
20    return Extension(
21        name,
22        sources=sources,
23        include_dirs=[rocm_inc],
24        library_dirs=[rocm_lib_dir],
25        language="c",
26        extra_compile_args=cflags,
27    )
28
29setup(
30  ext_modules = cythonize(
31    [create_extension("ccuda_stream", ["ccuda_stream.pyx"]),],
32    compiler_directives=dict(language_level=3),
33    compile_time_env=dict(HIP_PYTHON=True),
34  )
35)

For your convenience, you can use the Makefile below to build a Cython module in-place (via make build) and run the code (by importing the module via make run).

Listing 22 Makefile#
 1PYTHON ?= python3
 2
 3.PHONY: build run clean
 4
 5build:
 6	$(PYTHON) setup.py build_ext --inplace
 7run: build
 8	$(PYTHON) -c "import ccuda_stream"
 9clean:
10	rm -rf *.so *.c build/

HIP Python-Specific Code Modifications#

What will I learn?

  • That I can use HIP objects (via member variables) when importing the CUDA Python interoperability layer’s Python modules.

  • That I can access HIP enum constants also via their CUDA interoperability layer type.

  • That I can directly use HIP definitions too when cimporting the CUDA Python interoperability layer’s Cython modules.

In scenarios where the HIP Python Python or Cython code will need to diverge from the original CUDA Python code, e.g. due to differences in a signature, we can directly access the underlying HIP Python Python modules from the CUDA interoperability layer’s Python modules as shown in the example below.

Listing 23 Various ways to determine if we are working with HIP Python’s CUDA Python interoperability layer in Python code.#
 1from cuda import cuda # or cudart, or nvrtc
 2# [...]
 3if "HIP_PYTHON" in cuda:
 4   # do something (with cuda.hip.<...> or cuda.hip_python_mod.<...>)
 5if "hip" in cuda: # or "hiprtc" for nvrtc
 6   # do something with cuda.hip.<...> (or cuda.hip_python_mod.<...>)
 7if hasattr(cuda,"hip"): # or "hiprtc" for nvrtc
 8   # do something with cuda.hip.<...> (or cuda.hip_python_mod.<...>)
 9if "hip_python_mod" in cuda:
10   # do something with cuda.hip_python_mod.<...> (or cuda.hip.<...>) # or nvrtc.<...> for nvrtc
11if hasattr(cuda,"hip_python_mod"):
12   # do something with cuda.hip_python_mod.<...> (or cuda.hip.<...>) # or nvrtc.<...> for nvrtc

Moreover, the interoperability layer’s Python enum types also contain all the enum constants of their HIP analogue as shown in the snippet below.

Listing 24 Python enum class in cuda.pyx#
 1# [...]
 2class CUmemorytype(hip._hipMemoryType__Base,metaclass=_CUmemorytype_EnumMeta):
 3   hipMemoryTypeHost = hip.chip.hipMemoryTypeHost
 4   CU_MEMORYTYPE_HOST = hip.chip.hipMemoryTypeHost
 5   cudaMemoryTypeHost = hip.chip.hipMemoryTypeHost
 6   hipMemoryTypeDevice = hip.chip.hipMemoryTypeDevice
 7   CU_MEMORYTYPE_DEVICE = hip.chip.hipMemoryTypeDevice
 8   cudaMemoryTypeDevice = hip.chip.hipMemoryTypeDevice
 9   hipMemoryTypeArray = hip.chip.hipMemoryTypeArray
10   CU_MEMORYTYPE_ARRAY = hip.chip.hipMemoryTypeArray
11   hipMemoryTypeUnified = hip.chip.hipMemoryTypeUnified
12   CU_MEMORYTYPE_UNIFIED = hip.chip.hipMemoryTypeUnified
13   hipMemoryTypeManaged = hip.chip.hipMemoryTypeManaged
14   cudaMemoryTypeManaged = hip.chip.hipMemoryTypeManaged
15# [...]

In the c-prefixed Cython declaration files (cuda.ccuda.pxd, cuda.ccudart.pxd, and cuda.cnvrtc.pxd), you will further find that the HIP functions and union/struct types are directly included too:

Listing 25 Excerpt from ccuda.pxd#
1# [...]
2from hip.chip cimport hipDeviceProp_t
3from hip.chip cimport hipDeviceProp_t as cudaDeviceProp
4# [...]
5from hip.chip cimport hipMemcpy
6from hip.chip cimport hipMemcpy as cudaMemcpy
7# [...]

In the Cython declaration files without c-prefix (cuda.cuda.pxd, cuda.cudart.pxd, and cuda.nvrtc.pxd), you will discover that the original HIP types (only those derived from unions and structs) are c-imported too and that the CUDA interoperability layer types are made subclasses of the respective HIP type; see the example below. This allows to pass them to the CUDA interoperability layer’s Python functions, i.e., the aliased HIP Python functions.

Listing 26 Excerpt from cuda.pxd#
 1# [...]
 2from hip.hip cimport hipKernelNodeParams # here
 3cdef class CUDA_KERNEL_NODE_PARAMS(hip.hip.hipKernelNodeParams):
 4   pass
 5cdef class CUDA_KERNEL_NODE_PARAMS_st(hip.hip.hipKernelNodeParams):
 6   pass
 7cdef class CUDA_KERNEL_NODE_PARAMS_v1(hip.hip.hipKernelNodeParams):
 8   pass
 9cdef class cudaKernelNodeParams(hip.hip.hipKernelNodeParams):
10   pass
11# [...]