Intro to AMD CDNA Architecture#
The AMD CDNA architecture is a specialized GPU design for high-performance computing (HPC) and AI workloads. Unlike the RDNA architecture used in gaming GPUs, CDNA is optimized for data center tasks, prioritizing compute density, memory bandwidth, and scalability. This is achieved through several key architectural features.
For more information about the AMD GPU architecture, see the GPU architecture documentation.
Implications for CK Tile#
Understanding the CDNA architecture is crucial for effective use of CK Tile:
Thread Organization: CK Tile’s hierarchical Thread Mapping - Connecting to Hardware (blocks → warps → threads) directly maps to CDNA’s hardware organization.
Memory Hierarchy: CK Tile’s CK Tile buffer view and Tile Window - Data Access Gateway are designed to efficiently utilize the L2, Infinity Cache, and LDS hierarchy.
Register Pressure: CK Tile’s compile-time optimizations help minimize VGPR usage, preventing spills to slower memory.
Warp Execution: CK Tile’s ck_tile_tile_distribution ensures that threads within a warp access contiguous memory for optimal SIMD execution.
LDS Utilization: CK Tile’s Static Distributed Tensor and Tile Window - Data Access Gateway make effective use of the 64KB LDS per CU.
By understanding these architectural features, developers can better appreciate how CK Tile’s abstractions map to hardware capabilities and why certain design decisions were made in the framework.
Related Topics
Thread Mapping - Connecting to Hardware - How threads are organized and mapped to hardware
Coordinate Systems - The Mathematical Foundation - Mathematical foundation for data distribution
Understanding AMD GPU LDS and Bank Conflicts - Optimizing shared memory access patterns
LoadStoreTraits - Memory Access Optimization Engine - Memory access optimization strategies
A Block GEMM on MI300 - Practical application of architecture knowledge