CK Tile Conceptual Documentation

CK Tile Conceptual Documentation#

Welcome to the conceptual documentation for CK Tile, the core abstraction layer of Composable Kernel that enables efficient GPU programming through compile-time coordinate transformations and tile-based data distribution.

See the CK Tile conceptual documentation table of contents for the complete CK Tile documentation structure.

Overview#

CK Tile provides a mathematical framework for expressing complex GPU computations through:

Automatic Memory Coalescing: Ensures optimal memory access patterns without manual optimization
Thread Cooperation: Coordinates work distribution across the GPU’s hierarchical execution model
Zero-Overhead Abstractions: Compile-time optimizations ensure no runtime performance penalty
Portable Performance: Same code achieves high performance across different GPU architectures

Why CK Tile?#

Traditional GPU programming requires manual management of:

Thread-to-data mapping calculations
Memory coalescing patterns
Bank conflict avoidance
Boundary condition handling

CK Tile automates all of these concerns through a unified abstraction that maps logical problem coordinates to physical GPU resources.

Learning Path#

Start Here: Introduction and Motivation - Why Tile Distribution Matters

The fundamental problems CK Tile solves and why it’s essential for efficient GPU programming.
Foundation: Buffer Views - Raw Memory Access

How CK Tile provides structured access to raw GPU memory across different address spaces.
Multi-Dimensional Views: Tensor Views - Multi-Dimensional Structure

How to work with multi-dimensional data structures and memory layouts.
Core API: ck_tile_distribution

The tile distribution system that maps work to GPU threads.
Mathematical Framework: Coordinate Systems - The Mathematical Foundation

The coordinate transformation system that powers CK Tile’s abstractions.
Reference: Terminology Reference - Key Concepts and Definitions

Glossary of all terms and concepts used in CK Tile.

Key Concepts at a Glance#

Coordinate Spaces

P-space: Processing element coordinates (thread, warp, block)
Y-space: Local tile access patterns
X-space: Physical tensor coordinates
D-space: Linearized memory addresses

Core Components

BufferView: Type-safe access to GPU memory
TileDistribution: Automatic work distribution
TileWindow: Efficient data loading/storing
Encoding: Compile-time distribution specification

Quick Example#

// Define how to distribute a 256x256 tile across threads
using Encoding = tile_distribution_encoding<
    sequence<>,                              // No replication
    tuple<sequence<4,2,8,4>,                // M dimension hierarchy
          sequence<4,2,8,4>>,               // N dimension hierarchy
    tuple<sequence<1,2>, sequence<1,2>>,    // Thread mapping
    tuple<sequence<1,1>, sequence<2,2>>,    // Minor indices
    sequence<1,1,2,2>,                      // Y-space mapping
    sequence<0,3,0,3>                       // Y-space minor
>;

// Create distribution and load data
auto distribution = make_static_tile_distribution(Encoding{});
auto window = make_tile_window(tensor_view, tile_size, origin, distribution);
auto tile = window.load();

// Process tile efficiently
sweep_tile(tile, [](auto idx) { /* computation */ });

Next Steps#

To dive deeper, start with Introduction and Motivation - Why Tile Distribution Matters to understand the motivation and core concepts behind CK Tile.

For practical examples, see the example/ck_tile directory in the Composable Kernel repository.