CUDA® Python Interoperability#
2023-06-23
20 min read
Advanced Micro Devices, Inc.
This chapter discusses HIP Python’s CUDA® Python interoperability layer that is shipped
in a separate package with the name hip-python-as-cuda
.
In particular, we discuss how to run existing CUDA Python code on AMD GPUs, and
if localized modifications are required, how to detect HIP Python and how to fall
back to the underlying HIP Python Python and Cython modules.
Moreover, a technique named “enum constant hallucination” is presented
that allows HIP Python “invent” enum constants and their non-conflicting value
on-the-fly for enum error types.
Note
All examples in this chapter have been tested with ROCm™ 5.4.3 on Ubuntu 22. The License applies to all examples in this chapter.
Installation#
HIP Python’s CUDA interoperability layer comes in a separate Python 3 package with the name hip-python-as-cuda
.
Its sole dependency is the hip-python
package with the exact same version number.
After having identified the correct package for your ROCm™ installation, type:
python3 -m pip install hip-python-as-cuda-<hip_version>.<hip_python_version>
or, if you have a HIP Python wheel somewhere in your filesystem, type:
python3 -m pip install <path/to/hip_python_as_cuda>.whl
Note
The first option will only be available after the public release on PyPI.
Note
See HIP Python Versioning for more details on the hip-python
and hip-python-as-cuda
version number.
Basic Usage (Python)#
What will I learn?
How I can use HIP Python’s CUDA Python interoperability modules in my Python code.
Note
Most links in this tutorial to the CUDA Python interoperability layer API are broken. Until we find a way to index the respective Python modules, you must unfortunately use the search function for CUDA Python interoperability layer symbols.
After installing the HIP Python package hip-python-as-cuda
, you can import the individual
modules that you need as shown below:
1from cuda import cuda
2from cuda import cudart
3from cuda import nvrtc
Note
When writing this documentation, only Python and Cython modules for the libraries cuda
(CUDA Driver), cudart
(CUDA runtime), and nvrtc
(NVRTC) were shipped by CUDA Python. Therefore, HIP Python only provides interoperability modules for them and no other CUDA library.
Python Example#
What will I learn?
How I can run simple CUDA Python applications directly on AMD GPUs via HIP Python.
After installing the HIP Python package hip-python-as-cuda
, you can run the below example
directly on AMD GPUs. There is nothing else to do.
This works because all CUDA Python functions, types and even enum constants are aliases
of HIP objects.
See
cudaError_t
, cudaError_t
, cudaStreamCreate
, cudaMemcpyAsync
,
cudaMemsetAsync
,cudaStreamSynchronize
,cudaStreamDestroy
,cudaFree
1import ctypes
2import random
3import array
4
5from cuda import cuda
6
7def cuda_check(call_result):
8 err = call_result[0]
9 result = call_result[1:]
10 if len(result) == 1:
11 result = result[0]
12 if isinstance(err, cuda.cudaError_t) and err != cuda.cudaError_t.cudaSuccess:
13 raise RuntimeError(str(err))
14 return result
15
16# inputs
17n = 100
18x_h = array.array("i",[int(random.random()*10) for i in range(0,n)])
19num_bytes = x_h.itemsize * len(x_h)
20x_d = cuda_check(cuda.cudaMalloc(num_bytes))
21
22stream = cuda_check(cuda.cudaStreamCreate())
23cuda_check(cuda.cudaMemcpyAsync(x_d,x_h,num_bytes,cuda.cudaMemcpyKind.cudaMemcpyHostToDevice,stream))
24cuda_check(cuda.cudaMemsetAsync(x_d,0,num_bytes,stream))
25cuda_check(cuda.cudaMemcpyAsync(x_h,x_d,num_bytes,cuda.cudaMemcpyKind.cudaMemcpyDeviceToHost,stream))
26cuda_check(cuda.cudaStreamSynchronize(stream))
27cuda_check(cuda.cudaStreamDestroy(stream))
28
29# deallocate device data
30cuda_check(cuda.cudaFree(x_d))
31
32for i,x in enumerate(x_h):
33 if x != 0:
34 raise ValueError(f"expected '0' for element {i}, is: '{x}'")
35print("ok")
What is happening?
See HIP Streams for an explanation of a similar HIP program’s steps.
Enum Constant Hallucination#
What will I learn?
How I can let HIP Python’s enum error types in the CUDA Python interoperability layer “invent” values for undefined enum constants (that do not conflict with the values of the defined constants).
We use the example below to demonstrate how you can deal with scenarios where a CUDA Python program,
which we want to run on AMD GPUs, performs an error check that involves enum constants that are not relevant for HIP programs and/or AMD GPUs.
As HIP Python’s routines will never return these enum constants, it is safe to generate values for them on the fly.
Such behavior can be enabled selectively for CUDA Python interoperability layer enums — either via the
respective environment variable HIP_PYTHON_{myenumtype}_HALLUCINATE
and/or at runtime
via the module variable with the same name in cuda
, cudart
, or nvtrc
.
The example below fails because there are no HIP analogues to the following constants:
cudaError_t.cudaErrorStartupFailure
cudaError_t.cudaError_t.cudaErrorNotPermitted
cudaError_t.cudaErrorSystemNotReady
cudaError_t.cudaErrorSystemDriverMismatch
cudaError_t.cudaErrorCompatNotSupportedOnDevice
cudaError_t.cudaErrorTimeout
cudaError_t.cudaErrorApiFailureBase
However, the example will run successfully if you set the environment variable HIP_PYTHON_cudaError_t_HALLUCINATE
to 1
, yes
, y
, or true
(case does not matter). Alternatively, you could set the module variable cuda.cudart.HIP_PYTHON_cudaError_t_HALLUCINATE
to True
;
see HIP Python-Specific Code Modifications on different ways to detect HIP Python in
order to introduce such a modification to your code.
1from cuda.cudart import cudaError_t
2
3error_kinds = ( # some of those do not exist in HIP
4 cudaError_t.cudaErrorInitializationError,
5 cudaError_t.cudaErrorInsufficientDriver,
6 cudaError_t.cudaErrorInvalidDeviceFunction,
7 cudaError_t.cudaErrorInvalidDevice,
8 cudaError_t.cudaErrorStartupFailure, # no HIP equivalent
9 cudaError_t.cudaErrorInvalidKernelImage,
10 cudaError_t.cudaErrorAlreadyAcquired,
11 cudaError_t.cudaErrorOperatingSystem,
12 cudaError_t.cudaErrorNotPermitted, # no HIP equivalent
13 cudaError_t.cudaErrorNotSupported,
14 cudaError_t.cudaErrorSystemNotReady, # no HIP equivalent
15 cudaError_t.cudaErrorSystemDriverMismatch, # no HIP equivalent
16 cudaError_t.cudaErrorCompatNotSupportedOnDevice, # no HIP equivalent
17 cudaError_t.cudaErrorDeviceUninitialized,
18 cudaError_t.cudaErrorTimeout, # no HIP equivalent
19 cudaError_t.cudaErrorUnknown,
20 cudaError_t.cudaErrorApiFailureBase, # no HIP equivalent
21)
22
23for err in error_kinds:
24 assert isinstance(err,cudaError_t)
25 assert (err != cudaError_t.cudaSuccess)
26print("ok")
Caution
Enum constant hallucination should only be used for error return values and not for enum constants that are passed as argument to one of the CUDA Python interoperability layer’s functions.
Basic Usage (Cython)#
What will I learn?
How I can use the CUDA Python interoperability layer’s Cython and Python modules in my code.
You can import the Python objects that you need into your *.pyx
file as shown below:
1from cuda import cuda # enum types, enum aliases, fields
2from cuda import nvrtc
3# ...
In the same file, you can also or alternatively cimport
the cdef
entities
as shown below:
1from cuda cimport ccuda # direct access to C interfaces and lazy function loaders
2from cuda cimport ccudart
3from cuda cimport cnvrtc
4...
5
6from cuda cimport cuda # access to `cdef class` and `ctypedef` types
7 # that have been created per C struct/union/typedef
8from cuda cimport cudart
9from cuda cimport nvrtc
10 # ...
Cython Example#
What will I learn?
That I can port CUDA Python Cython code to AMD GPUs with minor modifications.
How I can introduce different compilation paths for HIP Python’s CUDA interoperability layer and CUDA Python.
The example below shows a CUDA Python example that can be compiled for and run on AMD GPUs.
To do so, it is necessary to define the compiler flag HIP_Python
from within the setup.py
script.
(We will discuss how to do so in short.)
This will replace the qualified C++
-like enum constant expression
ccudart.cudaError_t.cudaSuccess
by the C
-like expression
ccudart.cudaSuccess
.
In the example, the DEF
statement and the IF
and ELSE
statements are Cython
compile time definitions
and conditional statements,
respectively.
1cimport cuda.ccudart as ccudart
2
3cdef ccudart.cudaError_t err
4cdef ccudart.cudaStream_t stream
5DEF num_bytes = 4*100
6cdef char[num_bytes] x_h
7cdef void* x_d
8cdef int x
9
10def cuda_check(ccudart.cudaError_t err):
11 IF HIP_PYTHON: # HIP Python CUDA interop layer Cython interfaces are used like C API
12 success_status = ccudart.cudaSuccess
13 ELSE:
14 success_status = ccudart.cudaError_t.cudaSuccess
15 if err != success_status:
16 raise RuntimeError(f"reason: {err}")
17
18IF HIP_PYTHON:
19 print("using HIP Python wrapper for CUDA Python")
20
21cuda_check(ccudart.cudaStreamCreate(&stream))
22cuda_check(ccudart.cudaMalloc(&x_d, num_bytes))
23cuda_check(ccudart.cudaMemcpyAsync(x_d,x_h, num_bytes, ccudart.cudaMemcpyHostToDevice, stream))
24cuda_check(ccudart.cudaMemsetAsync(x_d, 0, num_bytes, stream))
25cuda_check(ccudart.cudaMemcpyAsync(x_h, x_d, num_bytes, ccudart.cudaMemcpyDeviceToHost, stream))
26cuda_check(ccudart.cudaStreamSynchronize(stream))
27cuda_check(ccudart.cudaStreamDestroy(stream))
28
29# deallocate device data
30cuda_check(ccudart.cudaFree(x_d))
31
32for i in range(0,round(num_bytes/4)):
33 x = (<int*>&x_h[4*i])[0]
34 if x != 0:
35 raise ValueError(f"expected '0' for element {i}, is: '{x}'")
36print("ok")
What is happening?
See HIP Streams for an explanation of a similar HIP Python program’s steps.
The example can be compiled for AMD GPUs via the following setup.py script,
which specifies compile_time_env=dict(HIP_PYTHON=True)
as keyword parameter
of the cythonize
call in line
1import os
2
3from setuptools import Extension, setup
4from Cython.Build import cythonize
5
6ROCM_PATH=os.environ.get("ROCM_PATH", "/opt/rocm")
7HIP_PLATFORM = os.environ.get("HIP_PLATFORM", "amd")
8
9if HIP_PLATFORM not in ("amd", "hcc"):
10 raise RuntimeError("Currently only HIP_PLATFORM=amd is supported")
11
12def create_extension(name, sources):
13 global ROCM_PATH
14 global HIP_PLATFORM
15 rocm_inc = os.path.join(ROCM_PATH,"include")
16 rocm_lib_dir = os.path.join(ROCM_PATH,"lib")
17 platform = HIP_PLATFORM.upper()
18 cflags = ["-D", f"__HIP_PLATFORM_{platform}__"]
19
20 return Extension(
21 name,
22 sources=sources,
23 include_dirs=[rocm_inc],
24 library_dirs=[rocm_lib_dir],
25 language="c",
26 extra_compile_args=cflags,
27 )
28
29setup(
30 ext_modules = cythonize(
31 [create_extension("ccuda_stream", ["ccuda_stream.pyx"]),],
32 compiler_directives=dict(language_level=3),
33 compile_time_env=dict(HIP_PYTHON=True),
34 )
35)
For your convenience, you can use the Makefile below
to build a Cython module in-place (via make build
) and run the code (by importing the module via make run
).
1PYTHON ?= python3
2
3.PHONY: build run clean
4
5build:
6 $(PYTHON) setup.py build_ext --inplace
7run: build
8 $(PYTHON) -c "import ccuda_stream"
9clean:
10 rm -rf *.so *.c build/
HIP Python-Specific Code Modifications#
What will I learn?
That I can use HIP objects (via member variables) when
import
ing the CUDA Python interoperability layer’s Python modules.That I can access HIP enum constants also via their CUDA interoperability layer type.
That I can directly use HIP definitions too when
cimport
ing the CUDA Python interoperability layer’s Cython modules.
In scenarios where the HIP Python Python or Cython code will need to diverge from the original CUDA Python code, e.g. due to differences in a signature, we can directly access the underlying HIP Python Python modules from the CUDA interoperability layer’s Python modules as shown in the example below.
1from cuda import cuda # or cudart, or nvrtc
2# [...]
3if "HIP_PYTHON" in cuda:
4 # do something (with cuda.hip.<...> or cuda.hip_python_mod.<...>)
5if "hip" in cuda: # or "hiprtc" for nvrtc
6 # do something with cuda.hip.<...> (or cuda.hip_python_mod.<...>)
7if hasattr(cuda,"hip"): # or "hiprtc" for nvrtc
8 # do something with cuda.hip.<...> (or cuda.hip_python_mod.<...>)
9if "hip_python_mod" in cuda:
10 # do something with cuda.hip_python_mod.<...> (or cuda.hip.<...>) # or nvrtc.<...> for nvrtc
11if hasattr(cuda,"hip_python_mod"):
12 # do something with cuda.hip_python_mod.<...> (or cuda.hip.<...>) # or nvrtc.<...> for nvrtc
Moreover, the interoperability layer’s Python enum types also contain all the enum constants of their HIP analogue as shown in the snippet below.
1# [...]
2class CUmemorytype(hip._hipMemoryType__Base,metaclass=_CUmemorytype_EnumMeta):
3 hipMemoryTypeHost = hip.chip.hipMemoryTypeHost
4 CU_MEMORYTYPE_HOST = hip.chip.hipMemoryTypeHost
5 cudaMemoryTypeHost = hip.chip.hipMemoryTypeHost
6 hipMemoryTypeDevice = hip.chip.hipMemoryTypeDevice
7 CU_MEMORYTYPE_DEVICE = hip.chip.hipMemoryTypeDevice
8 cudaMemoryTypeDevice = hip.chip.hipMemoryTypeDevice
9 hipMemoryTypeArray = hip.chip.hipMemoryTypeArray
10 CU_MEMORYTYPE_ARRAY = hip.chip.hipMemoryTypeArray
11 hipMemoryTypeUnified = hip.chip.hipMemoryTypeUnified
12 CU_MEMORYTYPE_UNIFIED = hip.chip.hipMemoryTypeUnified
13 hipMemoryTypeManaged = hip.chip.hipMemoryTypeManaged
14 cudaMemoryTypeManaged = hip.chip.hipMemoryTypeManaged
15# [...]
In the c
-prefixed Cython declaration files (cuda.ccuda.pxd
, cuda.ccudart.pxd
, and cuda.cnvrtc.pxd
),
you will further find that the HIP functions and union/struct types are directly included too:
1# [...]
2from hip.chip cimport hipDeviceProp_t
3from hip.chip cimport hipDeviceProp_t as cudaDeviceProp
4# [...]
5from hip.chip cimport hipMemcpy
6from hip.chip cimport hipMemcpy as cudaMemcpy
7# [...]
In the Cython declaration files without c
-prefix (cuda.cuda.pxd
, cuda.cudart.pxd
, and cuda.nvrtc.pxd
),
you will discover that the original HIP types (only those derived from unions and structs) are c
-imported too
and that the CUDA interoperability layer types are made subclasses of the respective HIP type;
see the example below. This allows to pass them to the CUDA interoperability layer’s
Python functions, i.e., the aliased HIP Python functions.
1# [...]
2from hip.hip cimport hipKernelNodeParams # here
3cdef class CUDA_KERNEL_NODE_PARAMS(hip.hip.hipKernelNodeParams):
4 pass
5cdef class CUDA_KERNEL_NODE_PARAMS_st(hip.hip.hipKernelNodeParams):
6 pass
7cdef class CUDA_KERNEL_NODE_PARAMS_v1(hip.hip.hipKernelNodeParams):
8 pass
9cdef class cudaKernelNodeParams(hip.hip.hipKernelNodeParams):
10 pass
11# [...]