This page contains proposed changes for a future release of ROCm. Read the latest Linux release of ROCm documentation for your production environments.

Table Comparing Syntax for Different Compute APIs

Contents

Table Comparing Syntax for Different Compute APIs#

Term

CUDA

HIP

OpenCL

Device

int deviceId

int deviceId

cl_device

Queue

cudaStream_t

hipStream_t

cl_command_queue

Event

cudaEvent_t

hipEvent_t

cl_event

Memory

void *

void *

cl_mem

grid

grid

NDRange

block

block

work-group

thread

thread

work-item

warp

warp

sub-group

Thread-
index

threadIdx.x

threadIdx.x

get_local_id(0)

Block-
index

blockIdx.x

blockIdx.x

get_group_id(0)

Block-
dim

blockDim.x

blockDim.x

get_local_size(0)

Grid-dim

gridDim.x

gridDim.x

get_num_groups(0)

Device Kernel

__global__

__global__

__kernel

Device Function

__device__

__device__

Implied in device compilation

Host Function

__host_ (default)

__host_ (default)

Implied in host compilation

Host + Device Function

__host__ __device__

__host__ __device__

No equivalent

Kernel Launch

<<< >>>

hipLaunchKernel/hipLaunchKernelGGL/<<< >>>

clEnqueueNDRangeKernel

Global Memory

__global__

__global__

__global

Group Memory

__shared__

__shared__

__local

Constant

__constant__

__constant__

__constant

__syncthreads

__syncthreads

barrier(CLK_LOCAL_MEMFENCE)

Atomic Builtins

atomicAdd

atomicAdd

atomic_add

Precise Math

cos(f)

cos(f)

cos(f)

Fast Math

__cos(f)

__cos(f)

native_cos(f)

Vector

float4

float4

float4

Notes#

The indexing functions (starting with thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.