HIP math API

HIP math API#

HIP-Clang provides device-callable math operations, supporting most functions available in NVIDIA CUDA.

This section documents:

Maximum error bounds for supported HIP math functions
Currently unsupported functions

Error bounds on this page are measured in units in the last place (ULPs), representing the absolute difference between a HIP math function result and its corresponding C++ standard library function (e.g., comparing HIP’s sinf with C++’s sinf).

The following C++ example shows a simplified method for computing ULP differences between HIP and standard C++ math functions by first finding where the maximum absolute error occurs.

#include <hip/hip_runtime.h>
#include <iostream>
#include <vector>
#include <cmath>
#include <limits>

#define HIP_CHECK(expression)              \
    {                                      \
        const hipError_t err = expression; \
        if (err != hipSuccess) {           \
            std::cerr << "HIP error: "     \
                      << hipGetErrorString(err) \
                      << " at " << __LINE__ << "\n"; \
            exit(EXIT_FAILURE);            \
        }                                  \
    }

// Simple ULP difference calculator
int64_t ulp_diff(float a, float b) {
    if (a == b) return 0;
    union { float f; int32_t i; } ua{a}, ub{b};

    // For negative values, convert to a positive-based representation
    if (ua.i < 0) ua.i = std::numeric_limits<int32_t>::max() - ua.i;
    if (ub.i < 0) ub.i = std::numeric_limits<int32_t>::max() - ub.i;

    return std::abs((int64_t)ua.i - (int64_t)ub.i);
}

// Test kernel
__global__ void test_sin(float* out, int n) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < n) {
        float x = -M_PI + (2.0f * M_PI * i) / (n - 1);
        out[i] = sin(x);
    }
}

int main() {
    const int n = 1000000;
    const int blocksize = 256;
    std::vector<float> outputs(n);
    float* d_out;

    HIP_CHECK(hipMalloc(&d_out, n * sizeof(float)));
    dim3 threads(blocksize);
    dim3 blocks((n + blocksize - 1) / blocksize);  // Fixed grid calculation
    test_sin<<<blocks, threads>>>(d_out, n);
    HIP_CHECK(hipPeekAtLastError());
    HIP_CHECK(hipMemcpy(outputs.data(), d_out, n * sizeof(float), hipMemcpyDeviceToHost));

    // Step 1: Find the maximum absolute error
    double max_abs_error = 0.0;
    float max_error_output = 0.0;
    float max_error_expected = 0.0;

    for (int i = 0; i < n; i++) {
        float x = -M_PI + (2.0f * M_PI * i) / (n - 1);
        float expected = std::sin(x);
        double abs_error = std::abs(outputs[i] - expected);

        if (abs_error > max_abs_error) {
            max_abs_error = abs_error;
            max_error_output = outputs[i];
            max_error_expected = expected;
        }
    }

    // Step 2: Compute ULP difference based on the max absolute error pair
    int64_t max_ulp = ulp_diff(max_error_output, max_error_expected);

    // Output results
    std::cout << "Max Absolute Error: " << max_abs_error << std::endl;
    std::cout << "Max ULP Difference: " << max_ulp << std::endl;
    std::cout << "Max Error Values -> Got: " << max_error_output
              << ", Expected: " << max_error_expected << std::endl;

    HIP_CHECK(hipFree(d_out));
    return 0;
}

Intrinsic mathematical functions#

Intrinsic math functions are optimized for performance on HIP-supported hardware. These functions often trade some precision for faster execution, making them ideal for applications where computational efficiency is a priority over strict numerical accuracy. Note that intrinsics are supported on device only.

Floating-point Intrinsics#

Note

Only the nearest-even rounding mode is supported by default on AMD GPUs. The _rz, _ru, and _rd suffixed intrinsic functions exist in the HIP AMD backend if the OCML_BASIC_ROUNDED_OPERATIONS macro is defined.

Single precision intrinsics mathematical functions#
Function	Test Range	ULP Difference of Maximum Absolute Error
`float __cosf(float x)` Returns the fast approximate cosine of \(x\).	\(x \in [-\pi, \pi]\)	4
`float __exp10f(float x)` Returns the fast approximate for 10 ^x.	\(x \in [-4, 4]\)	18
`float __expf(float x)` Returns the fast approximate for e ^x.	\(x \in [-10, 10]\)	6
`float __fadd_rn(float x, float y)` Add two floating-point values in round-to-nearest-even mode.	\(x \in [-1000, 1000]\) \(y \in [-1000, 1000]\)	0
`float __fdiv_rn(float x, float y)` Divide two floating-point values in round-to-nearest-even mode.	\(x \in [-100, 100]\) \(y \in [-100, 100]\)	0
`float __fmaf_rn(float x, float y, float z)` Returns `x × y + z` as a single operation in round-to-nearest-even mode.	\(x \in [-100, 100]\) \(y \in [-10, 10]\) \(z \in [-10, 10]\)	0
`float __fmul_rn(float x, float y)` Multiply two floating-point values in round-to-nearest-even mode.	\(x \in [-100, 100]\) \(y \in [-100, 100]\)	0
`float __frcp_rn(float x, float y)` Returns `1 / x` in round-to-nearest-even mode.	\(x \in [-100, 100]\)	0
`float __frsqrt_rn(float x)` Returns `1 / √x` in round-to-nearest-even mode.	\(x \in [0.01, 100]\)	1
`float __fsqrt_rn(float x)` Returns `√x` in round-to-nearest-even mode.	\(x \in [0, 100]\)	1
`float __fsub_rn(float x, float y)` Subtract two floating-point values in round-to-nearest-even mode.	\(x \in [-1000, 1000]\) \(y \in [-1000, 1000]\)	0
`float __log10f(float x)` Returns the fast approximate for base 10 logarithm of \(x\).	\(x \in [10^{-6}, 10^6]\)	2
`float __log2f(float x)` Returns the fast approximate for base 2 logarithm of \(x\).	\(x \in [10^{-6}, 10^6]\)	1
`float __logf(float x)` Returns the fast approximate for natural logarithm of \(x\).	\(x \in [10^{-6}, 10^6]\)	2
`float __powf(float x, float y)` Returns the fast approximate of x ^y.	\(x \in [-4, 4]\) \(y \in [-2, 2]\)	1
`float __saturatef(float x)` Clamp \(x\) to [+0.0, 1.0].	\(x \in [-2, 3]\)	0
`float __sincosf(float x, float* sinptr, float* cosptr)` Returns the fast approximate of sine and cosine of \(x\).	\(x \in [-3, 3]\)	`sin`: 18 `cos`: 4
`float __sinf(float x)` Returns the fast approximate sine of \(x\).	\(x \in [-\pi, \pi]\)	18
`float __tanf(float x)` Returns the fast approximate tangent of \(x\).	\(x \in [-1.47\pi, 1.47\pi]\)	1

Double precision intrinsics mathematical functions#
Function	Test Range	ULP Difference of Maximum Absolute Error
`double __dadd_rn(double x, double y)` Add two floating-point values in round-to-nearest-even mode.	\(x \in [-1000, 1000]\) \(y \in [-1000, 1000]\)	0
`double __ddiv_rn(double x, double y)` Divide two floating-point values in round-to-nearest-even mode.	\(x \in [-100, 100]\) \(y \in [-100, 100]\)	0
`double __dmul_rn(double x, double y)` Multiply two floating-point values in round-to-nearest-even mode.	\(x \in [-100, 100]\) \(y \in [-100, 100]\)	0
`double __drcp_rn(double x, double y)` Returns `1 / x` in round-to-nearest-even mode.	\(x \in [-100, 100]\)	0
`double __dsqrt_rn(double x)` Returns `√x` in round-to-nearest-even mode.	\(x \in [0, 100]\)	0
`double __dsub_rn(double x, double y)` Subtract two floating-point values in round-to-nearest-even mode.	\(x \in [-1000, 1000]\) \(y \in [-1000, 1000]\)	0
`double __fma_rn(double x, double y, double z)` Returns `x × y + z` as a single operation in round-to-nearest-even mode.	\(x \in [-100, 100]\) \(y \in [-10, 10]\) \(z \in [-10, 10]\)	0

Integer intrinsics#

This section covers HIP integer intrinsic functions. ULP error values are omitted since they only apply to floating-point operations, not integer arithmetic.

Integer intrinsics mathematical functions#
Function
`unsigned int __brev(unsigned int x)` Reverse the bit order of a 32 bit unsigned integer.
`unsigned long long int __brevll(unsigned long long int x)` Reverse the bit order of a 64 bit unsigned integer.
`unsigned int __byte_perm(unsigned int x, unsigned int y, unsigned int z)` Return selected bytes from two 32-bit unsigned integers.
`unsigned int __clz(int x)` Return the number of consecutive high-order zero bits in 32 bit integer.
`unsigned int __clzll(long long int x)` Return the number of consecutive high-order zero bits in 64 bit integer.
`unsigned int __ffs(int x)` [1] Returns the position of the first set bit in a 32 bit integer. Note: if `x` is `0`, will return `0`
`unsigned int __ffsll(long long int x)` [1] Returns the position of the first set bit in a 64 bit signed integer. Note: if `x` is `0`, will return `0`
`unsigned int __fns32(unsigned int mask, unsigned int base, int offset)` Find the position of the n-th set to 1 bit in a 32-bit integer. Note: this intrinsic is emulated via software, so performance can be potentially slower
`unsigned int __fns64(unsigned long long int mask, unsigned int base, int offset)` Find the position of the n-th set to 1 bit in a 64-bit integer. Note: this intrinsic is emulated via software, so performance can be potentially slower
`unsigned int __funnelshift_l(unsigned int lo, unsigned int hi, unsigned int shift)` Concatenate \(hi\) and \(lo\), shift left by shift & 31 bits, return the most significant 32 bits.
`unsigned int __funnelshift_lc(unsigned int lo, unsigned int hi, unsigned int shift)` Concatenate \(hi\) and \(lo\), shift left by min(shift, 32) bits, return the most significant 32 bits.
`unsigned int __funnelshift_r(unsigned int lo, unsigned int hi, unsigned int shift)` Concatenate \(hi\) and \(lo\), shift right by shift & 31 bits, return the least significant 32 bits.
`unsigned int __funnelshift_rc(unsigned int lo, unsigned int hi, unsigned int shift)` Concatenate \(hi\) and \(lo\), shift right by min(shift, 32) bits, return the least significant 32 bits.
`unsigned int __hadd(int x, int y)` Compute average of signed input arguments, avoiding overflow in the intermediate sum.
`unsigned int __rhadd(int x, int y)` Compute rounded average of signed input arguments, avoiding overflow in the intermediate sum.
`unsigned int __uhadd(int x, int y)` Compute average of unsigned input arguments, avoiding overflow in the intermediate sum.
`unsigned int __urhadd (unsigned int x, unsigned int y)` Compute rounded average of unsigned input arguments, avoiding overflow in the intermediate sum.
`int __sad(int x, int y, int z)` Returns \(\|x - y\| + z\), the sum of absolute difference.
`unsigned int __usad(unsigned int x, unsigned int y, unsigned int z)` Returns \(\|x - y\| + z\), the sum of absolute difference.
`unsigned int __popc(unsigned int x)` Count the number of bits that are set to 1 in a 32 bit integer.
`unsigned int __popcll(unsigned long long int x)` Count the number of bits that are set to 1 in a 64 bit integer.
`int __mul24(int x, int y)` Multiply two 24bit integers.
`unsigned int __umul24(unsigned int x, unsigned int y)` Multiply two 24bit unsigned integers.
`int __mulhi(int x, int y)` Returns the most significant 32 bits of the product of the two 32-bit integers.
`unsigned int __umulhi(unsigned int x, unsigned int y)` Returns the most significant 32 bits of the product of the two 32-bit unsigned integers.
`long long int __mul64hi(long long int x, long long int y)` Returns the most significant 64 bits of the product of the two 64-bit integers.
`unsigned long long int __umul64hi(unsigned long long int x, unsigned long long int y)` Returns the most significant 64 bits of the product of the two 64 unsigned bit integers.

Function	Test Range	ULP Difference of Maximum Absolute Error
`float abs(float x)` Returns the absolute value of \(x\)	\(x \in [-20, 20]\)	0
`float fabsf(float x)` Returns the absolute value of x	\(x \in [-20, 20]\)	0
`float fdimf(float x, float y)` Returns the positive difference between \(x\) and \(y\).	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`float fmaf(float x, float y, float z)` Returns \(x \cdot y + z\) as a single operation.	\(x \in [-100, 100]\) \(y \in [-10, 10]\) \(z \in [-10, 10]\)	0
`float fmaxf(float x, float y)` Determine the maximum numeric value of \(x\) and \(y\).	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`float fminf(float x, float y)` Determine the minimum numeric value of \(x\) and \(y\).	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`float fmodf(float x, float y)` Returns the floating-point remainder of \(x / y\).	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`float modff(float x, float* iptr)` Break down \(x\) into fractional and integral parts.	\(x \in [-10, 10]\)	0
`float remainderf(float x, float y)` Returns single-precision floating-point remainder.	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`float remquof(float x, float y, int* quo)` Returns single-precision floating-point remainder and part of quotient.	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`float fdividef(float x, float y)` Divide two floating point values.	\(x \in [-100, 100]\) \(y \in [-100, 100]\)	0

Function	Test Range	ULP Difference of Maximum Absolute Error
`double abs(double x)` Returns the absolute value of \(x\)	\(x \in [-20, 20]\)	0
`double fabs(double x)` Returns the absolute value of x	\(x \in [-20, 20]\)	0
`double fdim(double x, double y)` Returns the positive difference between \(x\) and \(y\).	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`double fma(double x, double y, double z)` Returns \(x \cdot y + z\) as a single operation.	\(x \in [-100, 100]\) \(y \in [-10, 10]\) \(z \in [-10, 10]\)	0
`double fmax(double x, double y)` Determine the maximum numeric value of \(x\) and \(y\).	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`double fmin(double x, double y)` Determine the minimum numeric value of \(x\) and \(y\).	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`double fmod(double x, double y)` Returns the floating-point remainder of \(x / y\).	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`double modf(double x, double* iptr)` Break down \(x\) into fractional and integral parts.	\(x \in [-10, 10]\)	0
`double remainder(double x, double y)` Returns double-precision floating-point remainder.	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0
`double remquo(double x, double y, int* quo)` Returns double-precision floating-point remainder and part of quotient.	\(x \in [-10, 10]\) \(y \in [-3, 3]\)	0

Function	Test Range	ULP Difference of Maximum Absolute Error
`bool isfinite(float x)` Determine whether \(x\) is finite.	\(x \in [-\text{FLT_MAX}, \text{FLT_MAX}]\) Special values: \(\pm\infty\), NaN	0
`bool isinf(float x)` Determine whether \(x\) is infinite.	\(x \in [-\text{FLT_MAX}, \text{FLT_MAX}]\) Special values: \(\pm\infty\), NaN	0
`bool isnan(float x)` Determine whether \(x\) is a `NAN`.	\(x \in [-\text{FLT_MAX}, \text{FLT_MAX}]\) Special values: \(\pm\infty\), NaN	0
`bool signbit(float x)` Return the sign bit of \(x\).	\(x \in [-\text{FLT_MAX}, \text{FLT_MAX}]\) Special values: \(\pm\infty\), \(\pm0\), NaN	0
`float nanf(const char* tagp)` Returns “Not a Number” value.	Input strings: `""`, `"1"`, `"2"`, `"quiet"`, `"signaling"`, `"ind"`	0

Function	Test Range	ULP Difference of Maximum Absolute Error
`bool isfinite(double x)` Determine whether \(x\) is finite.	\(x \in [-\text{DBL_MAX}, \text{DBL_MAX}]\) Special values: \(\pm\infty\), NaN	0
`bool isin(double x)` Determine whether \(x\) is infinite.	\(x \in [-\text{DBL_MAX}, \text{DBL_MAX}]\) Special values: \(\pm\infty\), NaN	0
`bool isnan(double x)` Determine whether \(x\) is a `NAN`.	\(x \in [-\text{DBL_MAX}, \text{DBL_MAX}]\) Special values: \(\pm\infty\), NaN	0
`bool signbit(double x)` Return the sign bit of \(x\).	\(x \in [-\text{DBL_MAX}, \text{DBL_MAX}]\) Special values: \(\pm\infty\), \(\pm0\), NaN	0
`double nan(const char* tagp)` Returns “Not a Number” value.	Input strings: `""`, `"1"`, `"2"`, `"quiet"`, `"signaling"`, `"ind"`	0

Function	Test Range	ULP Difference of Maximum Absolute Error
`float erff(float x)` Returns the error function of \(x\).	\(x \in [-4, 4]\)	4
`float erfcf(float x)` Returns the complementary error function of \(x\).	\(x \in [-4, 4]\)	2
`float erfcxf(float x)` Returns the scaled complementary error function of \(x\).	\(x \in [-2, 2]\)	5
`float lgammaf(float x)` Returns the natural logarithm of the absolute value of the gamma function of \(x\).	\(x \in [0.5, 20]\)	4
`float tgammaf(float x)` Returns the gamma function of \(x\).	\(x \in [0.5, 15]\)	6

HIP math API

Contents

HIP math API#

Standard mathematical functions#

Arithmetic#

Classification#

Error and Gamma#

Exponential and Logarithmic#

Floating Point Manipulation#

Hypotenuse and Norm#

Power and Root#

Rounding#

Trigonometric and Hyperbolic#

No C++ STD Implementation#

Unsupported#

Intrinsic mathematical functions#

Floating-point Intrinsics#

Integer intrinsics#