Library setup, exit, and query routines#

ROCSHMEM_INIT#

__host__ void rocshmem_init(void)#
Parameters:

None.

Returns:

None.

Description: This routine initializes the rocSHMEM library and underlying transport layer. Before rocshmem_init is called, you must select the device that this PE is associated to by calling hipSetDevice.

__device__ void rocshmem_wg_init(void)#
Parameters:

None.

Returns:

None.

Description: This routine initializes device-side rocSHMEM resources. It must be called before any threads in this work-group invoke other rocSHMEM functions. It must be called collectively by all threads in the work-group.

ROCSHMEM_FINALIZE#

__host__ void rocshmem_finalize(void)#
Parameters:

None.

Returns:

None.

Description: This routine finalizes the rocSHMEM library.

__device__ void rocshmem_wg_finalize(void)#
Parameters:

None.

Returns:

None.

Description: This routine finalizes device-side rocSHMEM resources. It must be called before work-group completion if the work-group also called rocshmem_wg_init. It must be called collectively by all threads in the work-group.

ROCSHMEM_N_PES#

__host__ int rocshmem_n_pes(void)#
Parameters:

None.

Returns:

Total number of PEs.

Description: This routine queries the total number of PEs. It can be called before rocshmem_init.

__device__ int rocshmem_n_pes(void)
__device__ int rocshmem_ctx_n_pes(rocshmem_ctx_t ctx)#
Parameters:

ctx – GPU side context handle.

Returns:

Total number of PEs.

Description: This routine queries the total number of PEs for a given context. It can be called per thread with no performance penalty.

ROCSHMEM_MY_PE#

__host__ int rocshmem_my_pe(void)#
Parameters:

None.

Returns:

PE ID of the caller.

Description: This routine queries the PE ID of the caller. It can be called before rocshmem_init.

__device__ int rocshmem_my_pe(void)
__device__ int rocshmem_ctx_my_pe(rocshmem_ctx_t ctx)#
Parameters:

ctx – GPU side context handle.

Returns:

PE ID of the caller.

Description: This routine queries the PE ID of the caller. It can be called per thread with no performance penalty.