CK Tile Conceptual Documentation#
Welcome to the conceptual documentation for CK Tile, the core abstraction layer of Composable Kernel that enables efficient GPU programming through compile-time coordinate transformations and tile-based data distribution.
See the CK Tile Index for the complete CK Tile documentation structure.
Overview#
CK Tile provides a mathematical framework for expressing complex GPU computations through:
Automatic Memory Coalescing: Ensures optimal memory access patterns without manual optimization
Thread Cooperation: Coordinates work distribution across the GPU’s hierarchical execution model
Zero-Overhead Abstractions: Compile-time optimizations ensure no runtime performance penalty
Portable Performance: Same code achieves high performance across different GPU architectures
Why CK Tile?#
Traditional GPU programming requires manual management of:
Thread-to-data mapping calculations
Memory coalescing patterns
Bank conflict avoidance
Boundary condition handling
CK Tile automates all of these concerns through a unified abstraction that maps logical problem coordinates to physical GPU resources.
Learning Path#
Start Here: Introduction and Motivation - Why Tile Distribution Matters
The fundamental problems CK Tile solves and why it’s essential for efficient GPU programming.
Foundation: CK Tile buffer view
How CK Tile provides structured access to raw GPU memory across different address spaces.
Multi-Dimensional Views: Tensor Views - Multi-Dimensional Structure
How to work with multi-dimensional data structures and memory layouts.
Core API: Tile Distribution - The Core API
The tile distribution system that maps work to GPU threads.
Mathematical Framework: Coordinate Systems - The Mathematical Foundation
The coordinate transformation system that powers CK Tile’s abstractions.
Reference: Terminology Reference - Key Concepts and Definitions
Glossary of all terms and concepts used in CK Tile.
Key Concepts at a Glance#
Coordinate Spaces
P-space: Processing element coordinates (thread, warp, block)
Y-space: Local tile access patterns
X-space: Physical tensor coordinates
D-space: Linearized memory addresses
Core Components
BufferView: Type-safe access to GPU memory
TileDistribution: Automatic work distribution
TileWindow: Efficient data loading/storing
Encoding: Compile-time distribution specification
Quick Example#
// Define how to distribute a 256x256 tile across threads
using Encoding = tile_distribution_encoding<
sequence<>, // No replication
tuple<sequence<4,2,8,4>, // M dimension hierarchy
sequence<4,2,8,4>>, // N dimension hierarchy
tuple<sequence<1,2>, sequence<1,2>>, // Thread mapping
tuple<sequence<1,1>, sequence<2,2>>, // Minor indices
sequence<1,1,2,2>, // Y-space mapping
sequence<0,3,0,3> // Y-space minor
>;
// Create distribution and load data
auto distribution = make_static_tile_distribution(Encoding{});
auto window = make_tile_window(tensor_view, tile_size, origin, distribution);
auto tile = window.load();
// Process tile efficiently
sweep_tile(tile, [](auto idx) { /* computation */ });
Next Steps#
To dive deeper, start with Introduction and Motivation - Why Tile Distribution Matters to understand the motivation and core concepts behind CK Tile.
For practical examples, see the example/ck_tile directory in the Composable Kernel repository.