Table comparing syntax for different compute APIs

Contents

Table comparing syntax for different compute APIs#

Term

CUDA

HIP

OpenCL

Device

int deviceId

int deviceId

cl_device

Queue

cudaStream_t

hipStream_t

cl_command_queue

Event

cudaEvent_t

hipEvent_t

cl_event

Memory

void *

void *

cl_mem

grid

grid

NDRange

block

block

work-group

thread

thread

work-item

warp

warp

sub-group

Thread-
index

threadIdx.x

threadIdx.x

get_local_id(0)

Block-
index

blockIdx.x

blockIdx.x

get_group_id(0)

Block-
dim

blockDim.x

blockDim.x

get_local_size(0)

Grid-dim

gridDim.x

gridDim.x

get_num_groups(0)

Device Kernel

__global__

__global__

__kernel

Device Function

__device__

__device__

Implied in device compilation

Host Function

__host_ (default)

__host_ (default)

Implied in host compilation

Host + Device Function

__host__ __device__

__host__ __device__

No equivalent

Kernel Launch

<<< >>>

hipLaunchKernel/hipLaunchKernelGGL/<<< >>>

clEnqueueNDRangeKernel

Global Memory

__global__

__global__

__global

Group Memory

__shared__

__shared__

__local

Constant

__constant__

__constant__

__constant

__syncthreads

__syncthreads

barrier(CLK_LOCAL_MEMFENCE)

Atomic Builtins

atomicAdd

atomicAdd

atomic_add

Precise Math

cos(f)

cos(f)

cos(f)

Fast Math

__cos(f)

__cos(f)

native_cos(f)

Vector

float4

float4

float4

Notes#

The indexing functions (starting with thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.