Installation#
This topic provides information required to install Tensile from source and run benchmarks.
Install ROCm#
To begin, install ROCm for your platform. For installation instructions, refer to the Linux or Windows installation guide.
Tip
If using Bash, set PATH=/opt/rocm/bin/:$PATH
in your ~/.bashrc
and refresh your shell using source ~/.bashrc
.
Alternatively, export the path for your current shell session using export PATH=/opt/rocm/bin/:$PATH
.
Install OS dependencies#
Note
The steps below are for Ubuntu. For other distributions, use the appropriate package manager.
Install dependencies:
apt-get install libyaml python3-yaml \ libomp-dev libboost-program-options-dev libboost-filesystem-dev
Install one of the following, depending on your preferred Tensile data format. If both are installed,
msgpack
is preferred:apt-get install libmsgpack-dev # If using the msgpack backend # OR apt-get install libtinfo-dev # If using the YAML backend
Install build tools. For additional installation methods for the latest versions of CMake, see the CMake installation page.
apt-get install build-essential cmake
Install Tensile from source#
To install Tensile from source, it is recommended to create a virtual environment first:
python3 -m venv .venv
source .venv/bin/activate
Then, you can install Tensile using pip or git.
Option 1: Install with pip#
pip3 install git+https://github.com/ROCmSoftwarePlatform/Tensile.git@develop
Option 2: Install with git#
git clone [email protected]:ROCm/Tensile.git && cd Tensile
pip3 install .
You can now run Tensile’s Python applications.
Running benchmark#
To run a benchmark, pass a tuning config to the Tensile
program located in Tensile/bin
.
For demonstration purposes, the sample tuning file available in Tensile/Configs/rocblas_sgemm_example.yaml
is used.
The sample tuning file allows you to specify the target architecture for which the benchmark will generate a library.
To find your device architecture, run:
rocminfo | grep gfx
Specify the device architecture in the sample tuning file using ArchitectureName:
. Based on the device architecture, use ArchitectureName: "gfx90a"
or ArchitectureName: "gfx1030"
.
You can now run benchmarks using Tensile. From the top-level directory, run:
mkdir build && cd build
../Tensile/bin/Tensile ../Tensile/Configs/rocblas_sgemm_example.yaml ./
After the benchmark completes, Tensile creates the following directories:
0_Build: Contains a client executable. Use this to launch Tensile from a library viewpoint.
1_BenchmarkProblems: Contains all the problem descriptions and executables generated during benchmarking. Use the
run.sh
script to reproduce results.2_BenchmarkData: Contains the raw performance results of all kernels in CSV and YAML formats.
3_LibraryLogic: Contains the winning (optimal) kernel configurations in YAML format. Typically, rocBLAS takes the YAML files from this folder.
4_LibraryClient: Contains the code objects, kernels, and library code. This is the output of running
TensileCreateLibrary
using the3_LibraryLogic
directory as an input.
The client is built at the beginning of the build and cached for future builds if the output directory and client build files are unchanged. To use the client, run:
./0_Build/client/tensile_client -h
./0_Build/client/tensile_client --config-file=1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_Final/source/ClientParameters.ini
Note
The benchmarking module Tensile.py is written in Python3. The programs generate kernels and build all object files and C/C++ files used for benchmarking. Note that Tensile is NOT compatible with Python2.