Benchmark config example#
Tensile uses an incremental and programmable Benchmark protocol. Here is a sample benchmark config.yaml file used as an input to Tensile:
GlobalParameters:
PrintLevel: 1
ForceRedoBenchmarkProblems: False
ForceRedoLibraryLogic: True
ForceRedoLibraryClient: True
CMakeBuildType: Release
EnqueuesPerSync: 1
SyncsPerBenchmark: 1
LibraryPrintDebug: False
NumElementsToValidate: 128
ValidationMaxToPrint: 16
ValidationPrintValids: False
ShortNames: False
MergeFiles: True
PlatformIdx: 0
DeviceIdx: 0
DataInitTypeAB: 0
BenchmarkProblems:
- # sgemm NN
- # ProblemType
OperationType: GEMM
DataType: s
TransposeA: False
TransposeB: False
UseBeta: True
Batched: True
- # BenchmarkProblemSizeGroup
InitialSolutionParameters:
BenchmarkCommonParameters:
- ProblemSizes:
- Range: [ [5760], 0, [1], 0 ]
- LoopDoWhile: [False]
- NumLoadsCoalescedA: [-1]
- NumLoadsCoalescedB: [1]
- WorkGroupMapping: [1]
ForkParameters:
- ThreadTile:
- [ 8, 8 ]
- [ 4, 8 ]
- [ 4, 4 ]
- WorkGroup:
- [ 8, 16, 1 ]
- [ 16, 16, 1 ]
- LoopTail: [False, True]
- EdgeType: ["None", "Branch", "ShiftPtr"]
- DepthU: [ 8, 16]
- VectorWidth: [1, 2, 4]
BenchmarkForkParameters:
BenchmarkJoinParameters:
BenchmarkFinalParameters:
- ProblemSizes:
- Range: [ [5760], 0, [1], 0 ]
LibraryLogic:
LibraryClient:
Config.yaml structure#
The top-level data structures in the config.yaml structure are explained here:
GlobalParameters
: Contain a dictionary storing global parameters used for all parts of the benchmarking.BenchmarkProblems
: Contain a list of dictionaries representing the benchmarks to be conducted. Each dictionary represents a singleProblemType
for benchmarking. The keys for these dictionaries areProblemType
,InitialSolutionParameters
,BenchmarkCommonParameters
,ForkParameters
,BenchmarkForkParameters
,JoinParameters
,BenchmarkJoinParameters
, andBenchmarkFinalParameters
. See Benchmark protocol for more information on these steps.LibraryLogic
: Contains a dictionary that stores parameters for analyzing the benchmark data and designing the solution selection by the backend library for certainProblemSizes
.LibraryClient
: Contains a dictionary that stores parameters for creating the library and a client that calls into the library.
Global parameters#
Here is a list of GlobalParameters
used in the config.yaml file:
Name
: Prefix to add to the API function names. It is typically the device name.MinimumRequiredVersion
: The Tensile version required to interpret the givem config.yaml file.RuntimeLanguage
: HIP runtime.KernelLanguage
: For HIP runtime, set the kernel language to HIP or assembly (gfx803, gfx900).PrintLevel
: 0 = Tensile prints nothing, 1 = Prints sparingly, 2 = Prints extensively.ForceRedoBenchmarkProblems
: False = Avoids repeating a benchmark phase if results for it already exist.ForceRedoLibraryLogic
: False = Avoids regenerating library logic if it already exist.ForceRedoLibraryClient
: False = Avoids regenerating library client if it already exist.CMakeBuildType
: Release or Debug.EnqueuesPerSync
: Number of enqueues before syncing the queue.SyncsPerBenchmark
: Number of queue syncs for each problem size.LibraryPrintDebug
: True = Tensile solutions print kernel enqueue info to stdout.NumElementsToValidate
: Number of elements to validate. 0 = no validation.ValidationMaxToPrint
: Number of invalid results to be printed.ValidationPrintValids
: True = Prints valid validation comparisons including invalids.ShortNames
: Converts long kernel, solution, and files names to short serial IDs.MergeFiles
: False = Writes each solution and kernel to its own file.DeviceIdx
: HIP device ID.DataInitType[AB,C]
: Initializes validation data with 0 = 0’s, 1 = 1’s, 2 = serial, and 3 = random.KernelTime
: Ensures using kernel time reported by runtime instead of time reported by APIs using CPU clocks, to compare kernel performance.
To see the exhaustive list of global parameters and their defaults, see Common.py.
Problem type parameters#
Here is a list of ProblemType
parameters used under BenchmarkProblems
in the config.yaml file:
OperationType
: GEMM orTensorContraction
.DataType
: s, d, c, z, or h.UseBeta
: False = Library or solutions or kernel accepts no beta parameter, implying beta = 0.UseInitialStrides
: False = Data is contiguous in memory.HighPrecisionAccumulate
: For tmpC += a*b, ensures using twice the precision fortmpC
as forDataType
. Note that this parameter is not implemented yet.ComplexConjugateA
: True = The matrix A is stored as a complex conjugate. Ignored for real precision.ComplexConjugateB
: True = The matrix B is stored as a complex conjugate. Ignored for real precision.
For OperationType
= GEMM only:
TransposeA
: True or False.TransposeB
: True or False.Batched
: True. Note that False has been deprecated. ForOperationType
=TensorContraction
, shows batched GEMM NT: C[ijk] = Sum[l] A[ilk] * B[jlk].IndexAssignmentsA
: [0, 3, 2].IndexAssignmentsB
: [1, 3, 2].NumDimensionsC
: 3.
For solution or kernel parameters, see Kernel parameters.
Library logic#
Running the LibraryLogic
phase of benchmarking analyzes the benchmark data and encodes a mapping for each ProblemType
. For each ProblemType
, it maps problem sizes to the best solution (kernel). It is not uncommon for multiple problem sizes to share the same solution, but every kernel must map to at least one problem size.
LibraryLogic
files can be used to create a Tensile library for the set of problems.