CK Tile Conceptual Documentation#

Welcome to the conceptual documentation for CK Tile, the core abstraction layer of Composable Kernel that enables efficient GPU programming through compile-time coordinate transformations and tile-based data distribution.

See the CK Tile Index for the complete CK Tile documentation structure.

Overview#

CK Tile provides a mathematical framework for expressing complex GPU computations through:

  • Automatic Memory Coalescing: Ensures optimal memory access patterns without manual optimization

  • Thread Cooperation: Coordinates work distribution across the GPU’s hierarchical execution model

  • Zero-Overhead Abstractions: Compile-time optimizations ensure no runtime performance penalty

  • Portable Performance: Same code achieves high performance across different GPU architectures

Why CK Tile?#

Traditional GPU programming requires manual management of:

  • Thread-to-data mapping calculations

  • Memory coalescing patterns

  • Bank conflict avoidance

  • Boundary condition handling

CK Tile automates all of these concerns through a unified abstraction that maps logical problem coordinates to physical GPU resources.

Learning Path#

  1. Start Here: Introduction and Motivation - Why Tile Distribution Matters

    The fundamental problems CK Tile solves and why it’s essential for efficient GPU programming.

  2. Foundation: CK Tile buffer view

    How CK Tile provides structured access to raw GPU memory across different address spaces.

  3. Multi-Dimensional Views: Tensor Views - Multi-Dimensional Structure

    How to work with multi-dimensional data structures and memory layouts.

  4. Core API: Tile Distribution - The Core API

    The tile distribution system that maps work to GPU threads.

  5. Mathematical Framework: Coordinate Systems - The Mathematical Foundation

    The coordinate transformation system that powers CK Tile’s abstractions.

  6. Reference: Terminology Reference - Key Concepts and Definitions

    Glossary of all terms and concepts used in CK Tile.

Key Concepts at a Glance#

Coordinate Spaces

  • P-space: Processing element coordinates (thread, warp, block)

  • Y-space: Local tile access patterns

  • X-space: Physical tensor coordinates

  • D-space: Linearized memory addresses

Core Components

  • BufferView: Type-safe access to GPU memory

  • TileDistribution: Automatic work distribution

  • TileWindow: Efficient data loading/storing

  • Encoding: Compile-time distribution specification

Quick Example#

// Define how to distribute a 256x256 tile across threads
using Encoding = tile_distribution_encoding<
    sequence<>,                              // No replication
    tuple<sequence<4,2,8,4>,                // M dimension hierarchy
          sequence<4,2,8,4>>,               // N dimension hierarchy
    tuple<sequence<1,2>, sequence<1,2>>,    // Thread mapping
    tuple<sequence<1,1>, sequence<2,2>>,    // Minor indices
    sequence<1,1,2,2>,                      // Y-space mapping
    sequence<0,3,0,3>                       // Y-space minor
>;

// Create distribution and load data
auto distribution = make_static_tile_distribution(Encoding{});
auto window = make_tile_window(tensor_view, tile_size, origin, distribution);
auto tile = window.load();

// Process tile efficiently
sweep_tile(tile, [](auto idx) { /* computation */ });

Next Steps#

To dive deeper, start with Introduction and Motivation - Why Tile Distribution Matters to understand the motivation and core concepts behind CK Tile.

For practical examples, see the example/ck_tile directory in the Composable Kernel repository.