ROCm Core SDK components#

The ROCm Core SDK is the foundation of the ROCm software stack. It provides the libraries, runtimes, compilers, and tools needed to develop and run GPU-accelerated applications on AMD hardware.

Math and compute libraries#

A comprehensive set of GPU-accelerated math libraries covering dense and sparse linear algebra, FFTs, random number generation, and more.

  • Libraries prefixed with roc* libraries are native, high-performance implementations written in HIP specifically for AMD GPUs.

  • Libraries prefixed with hip* are portable wrappers that implement NVIDIA CUDA-equivalent APIs, allowing CUDA applications to be ported to AMD GPUs with minimal code changes.

Libraries include:

Communication libraries#

Libraries for high-performance multi-GPU and multi-node communication:

  • RCCL – Standalone library that provides multi-GPU and multi-node collective communication primitives.

  • rocSHMEM – An intra-kernel networking library that provides GPU-centric networking through an OpenSHMEM-like interface.

Runtime and compilers#

The core execution environment and programming tools for GPU development on AMD hardware:

  • HIP – A C++ runtime API and kernel programming language designed for AMD GPUs. By providing an interface closely aligned with NVIDIA CUDA, HIP allows developers to write portable applications and efficiently migrate existing CUDA code to AMD platforms.

  • HIPIFY – Source translation tools for converting CUDA code to HIP. Automates porting existing CUDA applications and libraries to ROCm.

  • LLVM – AMD’s LLVM-based compiler infrastructure, including the ROCm device compiler (amdclang), which compiles HIP and OpenCL code for AMD GPUs.

Profiling and debugging tools#

Tools for measuring and analyzing GPU application performance and diagnosing issues:

  • ROCm Compute Profiler (rocprofiler-compute) – Application-level GPU performance analysis for identifying compute bottlenecks and roofline analysis.

  • ROCm Systems Profiler (rocprofiler-systems) – System-level profiling that captures GPU, CPU, and memory activity across an entire application run.

  • ROCprofiler-SDK – Low-level profiling API for building custom performance instrumentation on AMD GPUs.

  • ROCdbgapi – AMD GPU debugger API providing low-level access to GPU execution state.

  • ROCm Debugger (ROCgdb) – GDB-based debugger extended to support debugging GPU kernels running on AMD hardware.

  • ROCr Debug Agent – Runtime debug agent for capturing and reporting GPU execution faults and exceptions.

Control and monitoring tools#

Tools for inspecting and managing AMD GPU hardware state:

  • AMD SMI – C, C++, Python, Go, and Rust library interfaces and CLI (amd-smi) for monitoring and managing AMD devices through the amdgpu kernel driver. Reports and monitors power, temperature, utilization, memory usage, clock frequencies, and more.

  • ROCm Data Center Tool (RDC) – A monitoring and management framework for AMD GPUs in data center environments. Provides telemetry collection, health monitoring, and a plugin architecture for integration with cluster management systems.

  • rocminfo – Reports HSA runtime and agent information, including GPU topology, capability flags, and memory regions visible to the ROCm runtime.