Kernel language C++ support

Kernel language C++ support#

The HIP host API can be compiled with any conforming C++ compiler, as long as no kernel launch is present in the code.

To compile device code and include kernel launches, a compiler with full HIP support is needed, such as amdclang++. For more information, see ROCm compilers.

In host code all modern C++ standards that are supported by the compiler can be used. Device code compilation has some restrictions on modern C++ standards, but in general also supports all C++ standards. The biggest restriction is the reduced support of the C++ standard library in device code, as functions are only compiled for the host by default. An exception to this are constexpr functions that are resolved at compile time and can be used in device code. There are ongoing efforts to implement C++ standard library functionality with libhipcxx.

Supported kernel language C++ features#

This section describes HIP’s kernel language C++ feature support for the different versions of the standard.

General C++ features#

Exception handling#

An important difference between the host and device code C++ support is exception handling. In device code, exceptions aren’t available due to the hardware architecture. The device code must use return codes to handle errors.

Assertions#

The assert function is supported in device code. Assertions are used for debugging purposes. When the input expression equals zero, the execution will be stopped. HIP provides its own implementation for assert for usage in device code in hip/hip_runtime.h.

void assert(int input)

HIP also provides the function abort() which can be used to terminate the application when terminal failures are detected. It is implemented using the __builtin_trap() function.

This function produces a similar effect as using CUDA’s asm("trap"). In HIP, abort() terminates the entire application, while in CUDA, asm("trap") only terminates the current kernel and the application continues to run.

printf#

printf is supported in device code, and can be used just like in host code.

#include <hip/hip_runtime.h>

__global__ void run_printf() { printf("Hello World\n"); }

int main() {
  run_printf<<<dim3(1), dim3(1), 0, 0>>>();
}

Device-Side Dynamic Global Memory Allocation#

Device code can use new or malloc to dynamically allocate global memory on the device, and delete or free to deallocate global memory.

Classes#

Classes work on both host and device side, with some constraints on the device side.

Member functions with the appropriate qualifiers can be called in host and device code, and the corresponding overload is executed.

virtual member functions are also supported, however calling these functions from the host if the object was created on the device, or the other way around, is undefined behaviour.

The __host__, __device__, __managed__, __shared__ and __constant__ memory space qualifiers can not be applied to member variables.

C++11 support#

constexpr: Full support in device code. constexpr implicitly defines __host__ __device__, so standard library functions that are marked constexpr can be used in device code. constexpr variables can be used in both host and device code.
Lambdas: Lambdas are implicitly marked with __host__ __device__. To mark them as only executable for the host or the device, they can be explicitly marked like any other function. There are restrictions on variable capture, however. Host and device specific variables can only be accessed on other devices or the host by explicitly copying them. Accessing captured the variables by reference, when the variable is not located on the executing device or host, causes undefined behaviour.
Polymorphic function wrappers: HIP does not support the polymorphic function wrapper std::function

C++14 support#

All C++14 language features are supported.

C++17 support#

All C++17 language features are supported.

C++20 support#

Most C++20 language features are supported, but some restrictions apply. Coroutines are not available in device code.

Compiler features#

Pragma Unroll#

The unroll pragma for unrolling loops with a compile-time constant is supported:

#pragma unroll 16 /* hint to compiler to unroll next loop by 16 */
for (int i=0; i<16; i++) ...

#pragma unroll 1 /* tell compiler to never unroll the loop */
for (int i=0; i<16; i++) ...

#pragma unroll /* hint to compiler to completely unroll next loop. */
for (int i=0; i<16; i++) ...

In-Line Assembly#

GCN ISA In-line assembly can be included in device code.

It has to be mentioned however, that in-line assembly should be used carefully. For more information, please refer to the Inline ASM statements section of amdclang.

A short example program including inline assembly can be found in HIP inline_assembly sample.

For information on what special AMD GPU hardware features are available through assembly, please refer to the ISA manuals of the corresponding architecture.

Kernel Compilation#

hipcc now supports compiling C++/HIP kernels to binary code objects. The file format for the binary files is usually .co which means Code Object. The following command builds the code object using hipcc.

hipcc --genco --offload-arch=[TARGET GPU] [INPUT FILE] -o [OUTPUT FILE]

[TARGET GPU] = GPU architecture
[INPUT FILE] = Name of the file containing source code
[OUTPUT FILE] = Name of the generated code object file

For an example on how to use these object files, refer to the HIP module_api sample.

Architecture specific code#

amdclang++ defines __gfx*__ macros based on the GPU architecture to be compiled for. These macros can be used to include GPU architecture specific code. Refer to the sample in HIP gpu_arch sample.