Installing and building hipTensor

Installing and building hipTensor#

The quickest way to install hipTensor is to use the prebuilt packages that are released with ROCm. Alternatively, this topic includes instructions to build from source.

The available ROCm packages are:

hiptensor (library and header files for development)
hiptensor-dev (library development package)
hiptensor-samples (sample executables)
hiptensor-tests (test executables)
hiptensor-clients (samples and test executables)

Prerequisites#

hipTensor requires a ROCm-enabled platform, using ROCm version 6.4 or later. For more information, see ROCm installation and the ROCm GitHub.

Installing prebuilt packages#

To install hipTensor on Ubuntu or Debian, use these commands:

sudo apt-get update
sudo apt-get install hiptensor hiptensor-dev hiptensor-samples hiptensor-tests

To install hipTensor on RHEL-based systems, use:

sudo yum update
sudo yum install hiptensor hiptensor-dev hiptensor-samples hiptensor-tests

To install hipTensor on SLES, use:

sudo dnf upgrade
sudo dnf install hiptensor hiptensor-dev hiptensor-samples hiptensor-tests

After hipTensor is installed, it can be used just like any other library with a C++ API.

Building and installing hipTensor from source#

It isn’t necessary to build hipTensor from source because it’s ready to use after installing the prebuilt packages, as described above. To build hipTensor from source, follow the instructions in this section.

System requirements#

8GB of system memory is required for a full hipTensor build. This value might be lower if hipTensor is built without tests.

GPU support#

hipTensor is supported on the AMD CDNA class GPUs featuring matrix core support, including the gfx908, gfx90a, gfx940, gfx941, and gfx942 GPUs (collectively labeled as gfx9).

Note

Double precision FP64 datatype support requires the gfx90a, gfx940, gfx941, or gfx942.

Dependencies#

hipTensor is designed to have minimal external dependencies so it’s lightweight and portable. The following dependencies are required:

ROCm (Version 6.4 or later)
CMake (Version 3.14 or later)
rocm-cmake (Version 0.8.0 or later)
HIP runtime (Version 6.4.0 or later) (Or the ROCm hip-runtime-amd package)
LLVM dev package (Version 7.0 or later) (Also available as the ROCm rocm-llvm-dev package)
composable kernel (hipTensor uses the amd-master branch, which is a stable and widely adopted version for development.)

Note

It’s best to use ROCm packages from the same release where applicable.

Downloading hipTensor#

The hipTensor source code is available from the hipTensor GitHub. ROCm version 6.4 or later is required.

To verify the ROCm version installed on an Ubuntu system, use this command:

apt show rocm-libs -a

For RHEL-related systems, use:

yum info rocm-libs

The ROCm version has major, minor, and patch fields, possibly followed by a build-specific identifier. For example, a ROCm version of 4.0.0.40000-23 corresponds to major release = 4, minor release = 0, patch = 0, and build identifier 40000-23. The hipTensor GitHub has branches with names like rocm-major.minor.x, where major and minor are the same as for the ROCm version. To download hipTensor on ROCm version x.y, use this command:

git clone -b release/rocm-rel-x.y https://github.com/ROCm/hipTensor.git
cd hipTensor

Replace x.y in the above command with the version of ROCm installed on your machine. For example, if you have ROCm 6.4 installed, then replace release/rocm-rel-x.y with release/rocm-rel-6.4.

Build configuration#

You can choose to build any of the following combinations:

The hipTensor library only
The library and samples
The library and tests
The library, samples, and tests

You only need the hipTensor library to call and link to hipTensor API from your code. The clients contain the tests and sample code.

Here are the available options to build the hipTensor library, with or without clients.

Option	Description	Default value
`GPU_TARGETS`	Build the code for specific GPU target(s)	`gfx908`; `gfx90a`; `gfx942`
`HIPTENSOR_BUILD_TESTS`	Build the tests	`ON`
`HIPTENSOR_BUILD_SAMPLES`	Build the samples	`ON`
`HIPTENSOR_BUILD_COMPRESSED_DBG`	Enable compressed debug symbols	`ON`
`HIPTENSOR_DEFAULT_STRIDES_COL_MAJOR`	Set the hipTensor default data layout to column major	`ON`

Here are some example project configurations:

Configuration	Command
Basic	`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> .`
Targeting gfx908	`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DGPU_TARGETS=gfx908`
Debug build	`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DCMAKE_BUILD_TYPE=Debug`

Building the library alone#

By default, the project is configured for Release mode.

To build the library alone, run this command:

CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir> . -DHIPTENSOR_BUILD_TESTS=OFF -DHIPTENSOR_BUILD_SAMPLES=OFF

After configuration, build the library using this command:

cmake --build <build_dir> -- -j<nproc>

Note

It’s recommended to use a minimum of 16 threads to build hipTensor with any tests, for example, using -j16.

Building the library and samples#

To build the library and samples, run the following command:

CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir> . -DHIPTENSOR_BUILD_TESTS=OFF -DHIPTENSOR_BUILD_SAMPLES=ON

After configuration, build the library using this command:

cmake --build <build_dir> -- -j<nproc>

The samples folder in <build_dir> contains the executables in the table below.

Executable name	Description
`simple_bilinear_contraction_bf16_bf16_bf16_bf16_compute_bf16`	A simple bilinear contraction [D = alpha * (A x B) + beta * C] using half-precision brain float inputs, output, and compute types
`simple_bilinear_contraction_f16_f16_f16_f16_compute_f16`	A simple bilinear contraction [D = alpha * (A x B) + beta * C] using half-precision floating point inputs, output, and compute types
`simple_bilinear_contraction_f32_f32_f32_f32_compute_bf16`	A simple bilinear contraction [D = alpha * (A x B) + beta * C] using single-precision floating point input and output, and half-precision brain float compute types
`simple_bilinear_contraction_f32_f32_f32_f32_compute_f16`	A simple bilinear contraction [D = alpha * (A x B) + beta * C] using single-precision floating point input and output, and half-precision floating point compute types
`simple_bilinear_contraction_f32_f32_f32_f32_compute_f32`	A simple bilinear contraction [D = alpha * (A x B) + beta * C] using single-precision floating point input, output, and compute types
`simple_bilinear_contraction_cf32_cf32_cf32_cf32_compute_cf32`	A simple bilinear contraction [D = alpha * (A x B) + beta * C] using complex single-precision floating point input, output, and compute types
`simple_bilinear_contraction_f64_f64_f64_f64_compute_f32`	A simple bilinear contraction [D = alpha * (A x B) + beta * C] using double-precision floating point input and output and single-precision floating point compute types
`simple_bilinear_contraction_f64_f64_f64_f64_compute_f64`	A simple bilinear contraction [D = alpha * (A x B) + beta * C] using double-precision floating point input, output, and compute types
`simple_scale_contraction_bf16_bf16_bf16_compute_bf16`	A simple scale contraction [D = alpha * (A x B) ] using half-precision brain float inputs, output, and compute types
`simple_scale_contraction_f16_f16_f16_compute_f16`	A simple scale contraction [D = alpha * (A x B) ] using half-precision floating point inputs, output, and compute types
`simple_scale_contraction_f32_f32_f32_compute_bf16`	A simple scale contraction [D = alpha * (A x B) ] using single-precision floating point input and output and half-precision brain float compute types
`simple_scale_contraction_f32_f32_f32_compute_f16`	A simple scale contraction [D = alpha * (A x B) ] using single-precision floating point input and output and half-precision floating point compute types
`simple_scale_contraction_f32_f32_f32_compute_f32`	A simple scale contraction [D = alpha * (A x B) ] using single-precision floating point input, output, and compute types
`simple_scale_contraction_cf32_cf32_cf32_compute_cf32`	A simple scale contraction [D = alpha * (A x B) ] using complex single-precision floating point input, output, and compute types
`simple_scale_contraction_f64_f64_f64_compute_f32`	A simple scale contraction [D = alpha * (A x B) ] using double-precision floating point input and output and single-precision floating point compute types
`simple_scale_contraction_f64_f64_f64_compute_f64`	A simple scale contraction [D = alpha * (A x B) ] using double-precision floating point input, output, and compute types
`simple_permutation`	A simple permutation using single-precision floating point input and output types
`simple_reduction`	A simple reduction using single-precision floating point input and output types

Building the library and tests#

To build the library and tests, run the following command:

CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir> . -DHIPTENSOR_BUILD_TESTS=ON -DHIPTENSOR_BUILD_SAMPLES=OFF

After configuration, build using this command:

cmake --build <build_dir> -- -j<nproc>

The tests in <build_dir> contain executables, as shown in the table below.

Executable name	Description
`logger_test`	Unit test to validate hipTensor Logger APIs
`yaml_test`	Unit test to validate the YAML functionality used to bundle and run test suites
`bilinear_contraction_test_m1n1k1`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with half, single, and mixed precision datatypes of rank 2
`bilinear_contraction_test_m2n2k2`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with half, single, and mixed precision datatypes of rank 4
`bilinear_contraction_test_m3n3k3`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with half, single, and mixed precision datatypes of rank 6
`bilinear_contraction_test_m4n4k4`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with half, single, and mixed precision datatypes of rank 8
`bilinear_contraction_test_m5n5k5`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with half, single, and mixed precision datatypes of rank 10
`bilinear_contraction_test_m6n6k6`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with half, single, and mixed precision datatypes of rank 12
`complex_bilinear_contraction_test_m1n2k1`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with complex single and double precision datatypes of rank 2
`complex_bilinear_contraction_test_m2n2k2`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with complex single and double precision datatypes of rank 4
`complex_bilinear_contraction_test_m3n3k3`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with complex single and double precision datatypes of rank 6
`complex_bilinear_contraction_test_m4n4k4`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with complex single and double precision datatypes of rank 8
`complex_bilinear_contraction_test_m5n5k5`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with complex single and double precision datatypes of rank 10
`complex_bilinear_contraction_test_m6n6k6`	Bilinear contraction test [D = alpha * (A x B) + beta * C] with complex single and double precision datatypes of rank 12
`scale_contraction_test_m1n1k1`	Scale contraction test [D = alpha * (A x B)] with half, single, and mixed precision datatypes of rank 2
`scale_contraction_test_m2n2k2`	Scale contraction test [D = alpha * (A x B)] with half, single, and mixed precision datatypes of rank 4
`scale_contraction_test_m3n3k3`	Scale contraction test [D = alpha * (A x B)] with half, single, and mixed precision datatypes of rank 6
`scale_contraction_test_m4n4k4`	Scale contraction test [D = alpha * (A x B)] with half, single, and mixed precision datatypes of rank 8
`scale_contraction_test_m5n5k5`	Scale contraction test [D = alpha * (A x B)] with half, single, and mixed precision datatypes of rank 10
`scale_contraction_test_m6n6k6`	Scale contraction test [D = alpha * (A x B)] with half, single, and mixed precision datatypes of rank 12
`complex_scale_contraction_test_m1n1k1`	Scale contraction test [D = alpha * (A x B)] with complex single and double precision datatypes of rank 2
`complex_scale_contraction_test_m2n2k2`	Scale contraction test [D = alpha * (A x B)] with complex single and double precision datatypes of rank 4
`complex_scale_contraction_test_m3n3k3`	Scale contraction test [D = alpha * (A x B)] with complex single and double precision datatypes of rank 6
`complex_scale_contraction_test_m4n4k4`	Scale contraction test [D = alpha * (A x B)] with complex single and double precision datatypes of rank 8
`complex_scale_contraction_test_m5n5k5`	Scale contraction test [D = alpha * (A x B)] with complex single and double precision datatypes of rank 10
`complex_scale_contraction_test_m6n6k6`	Scale contraction test [D = alpha * (A x B)] with complex single and double precision datatypes of rank 12
`rank2_permutation_test`	Permutation test with half and single precision datatypes of rank 2
`rank3_permutation_test`	Permutation test with half and single precision datatypes of rank 3
`rank4_permutation_test`	Permutation test with half and single precision datatypes of rank 4
`rank5_permutation_test`	Permutation test with half and single precision datatypes of rank 5
`rank6_permutation_test`	Permutation test with half and single precision datatypes of rank 6
`rank1_reduction_test`	Reduction test with half, single, and double precision datatypes of rank 1
`rank2_reduction_test`	Reduction test with half, single, and double precision datatypes of rank 2
`rank3_reduction_test`	Reduction test with half, single, and double precision datatypes of rank 3
`rank4_reduction_test`	Reduction test with half, single, and double precision datatypes of rank 4
`rank5_reduction_test`	Reduction test with half, single, and double precision datatypes of rank 5
`rank6_reduction_test`	Reduction test with half, single, and double precision datatypes of rank 6

Make targets list#

When building hipTensor during the make step, you can specify the Make targets instead of defaulting to make all. The following table highlights the relationships between high-level grouped targets and individual targets.

Group target	Individual targets
`hiptensor_samples`	`simple_bilinear_contraction_bf16_bf16_bf16_bf16_compute_bf16`
	`simple_bilinear_contraction_f16_f16_f16_f16_compute_f16`
	`simple_bilinear_contraction_f32_f32_f32_f32_compute_bf16`
	`simple_bilinear_contraction_f32_f32_f32_f32_compute_f16`
	`simple_bilinear_contraction_f32_f32_f32_f32_compute_f32`
	`simple_bilinear_contraction_cf32_cf32_cf32_cf32_compute_cf32`
	`simple_bilinear_contraction_f64_f64_f64_f64_compute_f32`
	`simple_bilinear_contraction_f64_f64_f64_f64_compute_f64`
	`simple_scale_contraction_bf16_bf16_bf16_compute_bf16`
	`simple_scale_contraction_f16_f16_f16_compute_f16`
	`simple_scale_contraction_f32_f32_f32_compute_bf16`
	`simple_scale_contraction_f32_f32_f32_compute_f16`
	`simple_scale_contraction_f32_f32_f32_compute_f32`
	`simple_scale_contraction_cf32_cf32_cf32_compute_cf32`
	`simple_scale_contraction_f64_f64_f64_compute_f32`
	`simple_scale_contraction_f64_f64_f64_compute_f64`
	`simple_permutation`
	`simple_reduction`
`hiptensor_tests`	`logger_test`
	`yaml_test`
	`bilinear_contraction_test_m1n1k1`
	`bilinear_contraction_test_m2n2k2`
	`bilinear_contraction_test_m3n3k3`
	`bilinear_contraction_test_m4n4k4`
	`bilinear_contraction_test_m5n5k5`
	`bilinear_contraction_test_m6n6k6`
	`complex_bilinear_contraction_test_m1n1k1`
	`complex_bilinear_contraction_test_m2n2k2`
	`complex_bilinear_contraction_test_m3n3k3`
	`complex_bilinear_contraction_test_m4n4k4`
	`complex_bilinear_contraction_test_m5n5k5`
	`complex_bilinear_contraction_test_m6n6k6`
	`scale_contraction_test_m1n1k1`
	`scale_contraction_test_m2n2k2`
	`scale_contraction_test_m3n3k3`
	`scale_contraction_test_m4n4k4`
	`scale_contraction_test_m5n5k5`
	`scale_contraction_test_m6n6k6`
	`complex_scale_contraction_test_m1n1k1`
	`complex_scale_contraction_test_m2n2k2`
	`complex_scale_contraction_test_m3n3k3`
	`complex_scale_contraction_test_m4n4k4`
	`complex_scale_contraction_test_m5n5k5`
	`complex_scale_contraction_test_m6n6k6`
	`rank2_permutation_test`
	`rank3_permutation_test`
	`rank4_permutation_test`
	`rank5_permutation_test`
	`rank6_permutation_test`
	`rank1_reduction_test`
	`rank2_reduction_test`
	`rank3_reduction_test`
	`rank4_reduction_test`
	`rank5_reduction_test`
	`rank6_reduction_test`

Benchmarking scripts#

The benchmarking scripts are located at <project root>/scripts/performance/.

Script name	Description
`BenchmarkContraction.sh`	Benchmarking script for contraction
`BenchmarkPermutation.sh`	Benchmarking script for permutation
`BenchmarkReduction.sh`	Benchmarking script for reduction

Build performance#

Depending on the resources available to the build machine and the selected build configuration, hipTensor build times can take an hour or more. Here are some things you can do to reduce build times:

Target a specific GPU, for instance, with -D GPU_TARGETS=gfx908.
Use a large number of threads, for instance, -j32.
If the client builds aren’t needed, specify HIPTENSOR_BUILD_TESTS or HIPTENSOR_BUILD_SAMPLES as OFF to disable them.
During the make command, build a specific target, for instance, logger_test.

Test runtimes#

Depending on the resources available to the machine running the selected tests, hipTensor test runtimes can last an hour or more. Here are some things you can do to reduce test runtimes:

CTest runs the entire test suite, but you can invoke tests individually by name.
Use GoogleTest filters to target specific test cases:
```
<test_exe> --gtest_filter=*name_filter*
```
Manually adjust the test case coverage. Use a text editor to modify the test YAML configs to adjust the test parameter coverage.
Alternatively, use your own YAML config for testing with a reduced parameter set.
For tests with large tensor ranks, avoid using larger lengths to reduce the computational load.

Test verbosity and file redirection#

The tests support logging arguments to control verbosity and output redirection.

<test_exe> -y "testing_params.yaml" -o "output.csv" --omit 1

Compact	Verbose	Description
`-y <input_file>.yaml`		Override read testing parameters from input file
`-o <output_file>.csv`		Redirect GoogleTest output to file
	`--omit <code>`	code = 1: Omit gtest SKIPPED tests
		code = 2: Omit gtest FAILED tests
		code = 4: Omit gtest PASSED tests
		code = 8: Omit all gtest output
		code = <N>: OR combination of 1, 2, 4

Installing and building hipTensor

Contents

Installing and building hipTensor#

Prerequisites#

Installing prebuilt packages#

Building and installing hipTensor from source#

System requirements#

GPU support#

Dependencies#

Downloading hipTensor#

Build configuration#

Building the library alone#

Building the library and samples#

Building the library and tests#

Make targets list#

Benchmarking scripts#

Build performance#

Test runtimes#

Test verbosity and file redirection#