Tensile documentation#
Tensile is a tool for creating a benchmark-driven backend library for General Matrix-Matrix Multiplications (GEMMs), GEMM-like problems such as batched GEMM, N-dimensional tensor contractions, and anything else that multiplies two multidimensional objects together on an AMD GPU.
Tensile is written in Python for library and kernel generation and in C++ for client headers and library tests. It is a vital project in the ROCm ecosystem, providing optimized kernels for downstream libraries such as rocBLAS.
The parts of Tensile that are written in Python consist of applications that are collectively responsible for generating optimized kernels and library objects to access these kernels from client code.
The code is open source and hosted at ROCm/Tensile
To contribute to the documentation, refer to Contributing to ROCm.
You can find licensing information on the Licensing page.