Library setup, exit, and query routines#
ROCSHMEM_INIT#
-
__host__ void rocshmem_init(void)#
- Parameters:
None.
- Returns:
None.
Description:
This routine initializes the rocSHMEM library and underlying transport layer.
Before rocshmem_init
is called,
you must select the device that this PE is associated to by calling
hipSetDevice.
-
__device__ void rocshmem_wg_init(void)#
- Parameters:
None.
- Returns:
None.
Description: This routine initializes device-side rocSHMEM resources. It must be called before any threads in this work-group invoke other rocSHMEM functions. It must be called collectively by all threads in the work-group.
ROCSHMEM_FINALIZE#
-
__host__ void rocshmem_finalize(void)#
- Parameters:
None.
- Returns:
None.
Description: This routine finalizes the rocSHMEM library.
-
__device__ void rocshmem_wg_finalize(void)#
- Parameters:
None.
- Returns:
None.
Description:
This routine finalizes device-side rocSHMEM resources.
It must be called before work-group completion if the work-group also called rocshmem_wg_init
.
It must be called collectively by all threads in the work-group.
ROCSHMEM_N_PES#
-
__host__ int rocshmem_n_pes(void)#
- Parameters:
None.
- Returns:
Total number of PEs.
Description:
This routine queries the total number of PEs.
It can be called before rocshmem_init
.
-
__device__ int rocshmem_n_pes(void)
-
__device__ int rocshmem_ctx_n_pes(rocshmem_ctx_t ctx)#
- Parameters:
ctx – GPU side context handle.
- Returns:
Total number of PEs.
Description: This routine queries the total number of PEs for a given context. It can be called per thread with no performance penalty.
ROCSHMEM_MY_PE#
-
__host__ int rocshmem_my_pe(void)#
- Parameters:
None.
- Returns:
PE ID of the caller.
Description:
This routine queries the PE ID of the caller.
It can be called before rocshmem_init
.
-
__device__ int rocshmem_my_pe(void)
-
__device__ int rocshmem_ctx_my_pe(rocshmem_ctx_t ctx)#
- Parameters:
ctx – GPU side context handle.
- Returns:
PE ID of the caller.
Description: This routine queries the PE ID of the caller. It can be called per thread with no performance penalty.