hipBLAS API

Contents

hipBLAS API#

hipBLAS Interface#

The hipBLAS interface is compatible with rocBLAS and cuBLAS-v2 APIs. Porting a CUDA application which originally calls the cuBLAS API to an application calling hipBLAS API should be relatively straightforward. For example, the hipBLAS SGEMV interface is:

GEMV API#

hipblasStatus_t
hipblasSgemv(hipblasHandle_t handle,
             hipblasOperation_t trans,
             int m, int n, const float *alpha,
             const float *A, int lda,
             const float *x, int incx, const float *beta,
             float *y, int incy );

Naming conventions#

hipBLAS follows the following naming conventions:

  • Upper case for matrix, e.g. matrix A, B, C GEMM (C = A*B)

  • Lower case for vector, e.g. vector x, y GEMV (y = A*x)

Notations#

hipBLAS function uses the following notations to denote precisions:

  • h = half

  • bf = 16 bit brain floating point

  • s = single

  • d = double

  • c = single complex

  • z = double complex

ILP64 Interface#

The hipBLAS library Level-1 functions are also provided with ILP64 interfaces. With these interfaces all “int” arguments are replaced by the typename int64_t. These ILP64 function names all end with a suffix _64. The only output arguments that change are for the xMAX and xMIN for which the index is now int64_t. Function level documentation is not repeated for these API as they are identical in behavior to the LP64 versions, however functions which support this alternate API include the line: This function supports the 64-bit integer interface.

HIPBLAS_V2 and Deprecations#

As of hipBLAS version 2.0.0, hipblasDatatype_t is deprecated, along with all functions which use this type. In a future release, all uses of hipblasDatatype_t will be replaced by hipDataType. See the hipblasGemmEx() documentation for a small exception where hipblasComputeType_t replaces hipblasDatatype_t for the computeType parameter.

hipblasComplex and hipblasDoubleComplex are also deprecated. In a future release, all uses of these types will be replaced with their HIP counterparts: hipComplex and hipDoubleComplex.

While hipblasDatatype_t, hipblasComplex, and hipblasDoubleComplex are deprecated, users may use the compiler define or inline #define HIPBLAS_V2 before including the header file <hipblas.h> to access the updated API. In a future release, this define will no longer be needed and deprecated functions will be removed, leaving the updated interface. To see the new interfaces using hipDataType refer to the documentation for the following functions: hipblasTrsmEx, hipblasGemmEx, hipblasAxpyEx, hipblasDot(c)Ex, hipblasNrm2Ex, hipblasRotEx, hipblasScalEx, and all batched and strided-batched variants.

bfloat 16 Datatype#

hipBLAS defines a hipblasBfloat16 datatype. This type is exposed as a struct simply containing 16 bits of data. There is also a C++ hipblasBfloat16 class defined which gives slightly more functionality, including conversion to and from a 32-bit float datatype. This class can be used in C++11 or greater by defining HIPBLAS_BFLOAT16_CLASS before including the header file hipblas.h.

There is also an option to interpret the API as using the hip_bfloat16 datatype. This is provided to avoid casting when using the hip_bfloat16 datatype. To expose the API using hip_bfloat16, define HIPBLAS_USE_HIP_BFLOAT16 before including the header file hipblas.h.

Note

The hip_bfloat16 datatype is only supported on AMD platforms.

Complex Datatypes#

hipBLAS defines hipblasComplex and hipblasDoubleComplex structs. These types contain x and y components and identical memory layout to std::complex for float and double precision.

For simplified usage with Hipified code, there is an option to interpret the API as using hipComplex and hipDoubleComplex types (i.e. typedef hipComplex hipblasComplex). This is provided for users to avoid casting when using the hip complex types in their code. As the memory layout is consistent across all three types, it is safe to cast arguments to API calls between the 3 types: hipComplex, std::complex<float>, and hipblasComplex, as well as for the double precision variants. To expose the API as using the hip defined complex types, users can use either a compiler define or inline #define ROCM_MATHLIBS_API_USE_HIP_COMPLEX before including the header file <hipblas.h>. Thus, the API is compatible with both forms, but recompilation is required to avoid casting if switching to pass in the hip complex types.

Note

hipblasComplex, hipblasDoubleComplex, and the use of ROCM_MATHLIBS_API_USE_HIP_COMPLEX are now deprecated. The API will provide interfaces using only hipComplex and hipDoubleComplex in the future. See HIPBLAS_V2 and Deprecations for more information.

Atomic Operations#

Some functions in hipBLAS may use atomic operations to increase performance which may cause functions to not give bit-wise reproducible results. By default, the rocBLAS backend allows the use of atomics while the cuBLAS backend disallows the use of atomics. To set the desired behavior, users should call hipblasSetAtomicsMode(). Please see the rocBLAS or cuBLAS documentation for more information regarding specifics of atomic operations in the backend library.

hipBLAS Types#

Definitions#

hipblasHandle_t#

typedef void *hipblasHandle_t#

hipblasHanlde_t is a void pointer, to store the library context (either rocBLAS or cuBLAS)

hipblasHalf#

typedef uint16_t hipblasHalf#

To specify the datatype to be unsigned short.

hipblasInt8#

typedef int8_t hipblasInt8#

To specify the datatype to be signed char.

hipblasStride#

typedef int64_t hipblasStride#

Stride between matrices or vectors in strided_batched functions.

hipblasBfloat16#

struct hipblasBfloat16#

Struct to represent a 16 bit Brain floating-point number.

hipblasComplex#

struct hipblasComplex#

Struct to represent a complex number with single precision real and imaginary parts.

hipblasDoubleComplex#

struct hipblasDoubleComplex#

Struct to represent a complex number with double precision real and imaginary parts.

Enums#

Enumeration constants have numbering that is consistent with CBLAS, ACML and most standard C BLAS libraries.

hipblasStatus_t#

enum hipblasStatus_t#

hipblas status codes definition

Values:

enumerator HIPBLAS_STATUS_SUCCESS#

Function succeeds

enumerator HIPBLAS_STATUS_NOT_INITIALIZED#

HIPBLAS library not initialized

enumerator HIPBLAS_STATUS_ALLOC_FAILED#

resource allocation failed

enumerator HIPBLAS_STATUS_INVALID_VALUE#

unsupported numerical value was passed to function

enumerator HIPBLAS_STATUS_MAPPING_ERROR#

access to GPU memory space failed

enumerator HIPBLAS_STATUS_EXECUTION_FAILED#

GPU program failed to execute

enumerator HIPBLAS_STATUS_INTERNAL_ERROR#

an internal HIPBLAS operation failed

enumerator HIPBLAS_STATUS_NOT_SUPPORTED#

function not implemented

enumerator HIPBLAS_STATUS_ARCH_MISMATCH#

architecture mismatch

enumerator HIPBLAS_STATUS_HANDLE_IS_NULLPTR#

hipBLAS handle is null pointer

enumerator HIPBLAS_STATUS_INVALID_ENUM#

unsupported enum value was passed to function

enumerator HIPBLAS_STATUS_UNKNOWN#

back-end returned an unsupported status code

hipblasOperation_t#

enum hipblasOperation_t#

Used to specify whether the matrix is to be transposed or not.

Values:

enumerator HIPBLAS_OP_N#

Operate with the matrix.

enumerator HIPBLAS_OP_T#

Operate with the transpose of the matrix.

enumerator HIPBLAS_OP_C#

Operate with the conjugate transpose of the matrix.

hipblasPointerMode_t#

enum hipblasPointerMode_t#

Indicates if scalar pointers are on host or device. This is used for scalars alpha and beta and for scalar function return values.

Values:

enumerator HIPBLAS_POINTER_MODE_HOST#

Scalar values affected by this variable will be located on the host.

enumerator HIPBLAS_POINTER_MODE_DEVICE#

Scalar values affected by this variable will be located on the device.

hipblasFillMode_t#

enum hipblasFillMode_t#

Used by the Hermitian, symmetric and triangular matrix routines to specify whether the upper or lower triangle is being referenced.

Values:

enumerator HIPBLAS_FILL_MODE_UPPER#

Upper triangle

enumerator HIPBLAS_FILL_MODE_LOWER#

Lower triangle

enumerator HIPBLAS_FILL_MODE_FULL#

hipblasDiagType_t#

enum hipblasDiagType_t#

It is used by the triangular matrix routines to specify whether the matrix is unit triangular.

Values:

enumerator HIPBLAS_DIAG_NON_UNIT#

Non-unit triangular.

enumerator HIPBLAS_DIAG_UNIT#

Unit triangular.

hipblasSideMode_t#

enum hipblasSideMode_t#

Indicates the side matrix A is located relative to matrix B during multiplication.

Values:

enumerator HIPBLAS_SIDE_LEFT#

Multiply general matrix by symmetric, Hermitian or triangular matrix on the left.

enumerator HIPBLAS_SIDE_RIGHT#

Multiply general matrix by symmetric, Hermitian or triangular matrix on the right.

enumerator HIPBLAS_SIDE_BOTH#

hipblasDatatype_t#

enum hipblasDatatype_t#

Indicates the precision of data used. hipblasDatatype_t is deprecated as of hipBLAS 2.0.0 and will be removed in a future release as generally replaced by hipDataType.

Values:

enumerator HIPBLAS_R_16F#

16 bit floating point, real

enumerator HIPBLAS_R_32F#

32 bit floating point, real

enumerator HIPBLAS_R_64F#

64 bit floating point, real

enumerator HIPBLAS_C_16F#

16 bit floating point, complex

enumerator HIPBLAS_C_32F#

32 bit floating point, complex

enumerator HIPBLAS_C_64F#

64 bit floating point, complex

enumerator HIPBLAS_R_8I#

8 bit signed integer, real

enumerator HIPBLAS_R_8U#

8 bit unsigned integer, real

enumerator HIPBLAS_R_32I#

32 bit signed integer, real

enumerator HIPBLAS_R_32U#

32 bit unsigned integer, real

enumerator HIPBLAS_C_8I#

8 bit signed integer, complex

enumerator HIPBLAS_C_8U#

8 bit unsigned integer, complex

enumerator HIPBLAS_C_32I#

32 bit signed integer, complex

enumerator HIPBLAS_C_32U#

32 bit unsigned integer, complex

enumerator HIPBLAS_R_16B#

16 bit bfloat, real

enumerator HIPBLAS_C_16B#

16 bit bfloat, complex

enumerator HIPBLAS_DATATYPE_INVALID#

Invalid datatype value, do not use

hipblasComputeType_t#

enum hipblasComputeType_t#

The compute type to be used. Currently only used with GemmEx with the HIPBLAS_V2 interface. Note that support for compute types is largely dependent on backend.

Values:

enumerator HIPBLAS_COMPUTE_16F#

compute will be at least 16-bit precision

enumerator HIPBLAS_COMPUTE_16F_PEDANTIC#

compute will be exactly 16-bit precision

enumerator HIPBLAS_COMPUTE_32F#

compute will be at least 32-bit precision

enumerator HIPBLAS_COMPUTE_32F_PEDANTIC#

compute will be exactly 32-bit precision

enumerator HIPBLAS_COMPUTE_32F_FAST_16F#

32-bit input can use 16-bit compute

enumerator HIPBLAS_COMPUTE_32F_FAST_16BF#

32-bit input can is bf16 compute

enumerator HIPBLAS_COMPUTE_32F_FAST_TF32#

32-bit input can use tensor cores w/ TF32 compute. Only supported with cuBLAS backend currently

enumerator HIPBLAS_COMPUTE_64F#

compute will be at least 64-bit precision

enumerator HIPBLAS_COMPUTE_64F_PEDANTIC#

compute will be exactly 64-bit precision

enumerator HIPBLAS_COMPUTE_32I#

compute will be at least 32-bit integer precision

enumerator HIPBLAS_COMPUTE_32I_PEDANTIC#

compute will be exactly 32-bit integer precision

hipblasGemmAlgo_t#

enum hipblasGemmAlgo_t#

Indicates if layer is active with bitmask.

Values:

enumerator HIPBLAS_GEMM_DEFAULT#

enumerator rocblas_gemm_algo_standard

hipblasAtomicsMode_t#

enum hipblasAtomicsMode_t#

Indicates if atomics operations are allowed. Not allowing atomic operations may generally improve determinism and repeatability of results at a cost of performance. By default, the rocBLAS backend will allow atomic operations while the cuBLAS backend will disallow atomic operations. See backend documentation for more detail.

Values:

enumerator HIPBLAS_ATOMICS_NOT_ALLOWED#

Algorithms will refrain from atomics where applicable.

enumerator HIPBLAS_ATOMICS_ALLOWED#

Algorithms will take advantage of atomics where applicable.

hipBLAS Functions#

Level 1 BLAS#

hipblasIXamax + Batched, StridedBatched#

hipblasStatus_t hipblasIsamax(hipblasHandle_t handle, int n, const float *x, int incx, int *result)#
hipblasStatus_t hipblasIdamax(hipblasHandle_t handle, int n, const double *x, int incx, int *result)#
hipblasStatus_t hipblasIcamax(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, int *result)#
hipblasStatus_t hipblasIzamax(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, int *result)#

BLAS Level 1 API.

amax finds the first index of the element of maximum magnitude of a vector x.

  • Supported precisions in rocBLAS : s,d,c,z.

  • Supported precisions in cuBLAS : s,d,c,z.

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in x.

  • x[in] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of y.

  • result[inout] device pointer or host pointer to store the amax index. return is 0.0 if n, incx<=0.

The amax function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasIsamaxBatched(hipblasHandle_t handle, int n, const float *const x[], int incx, int batchCount, int *result)#
hipblasStatus_t hipblasIdamaxBatched(hipblasHandle_t handle, int n, const double *const x[], int incx, int batchCount, int *result)#
hipblasStatus_t hipblasIcamaxBatched(hipblasHandle_t handle, int n, const hipblasComplex *const x[], int incx, int batchCount, int *result)#
hipblasStatus_t hipblasIzamaxBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *const x[], int incx, int batchCount, int *result)#

BLAS Level 1 API.

amaxBatched finds the first index of the element of maximum magnitude of each vector x_i in a batch, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d,c,z.

  • Supported precisions in cuBLAS : No support.

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in each vector x_i

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment for the elements of each x_i. incx must be > 0.

  • batchCount[in] [int] number of instances in the batch, must be > 0.

  • result[out] device or host array of pointers of batchCount size for results. return is 0 if n, incx<=0.

The amaxBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasIsamaxStridedBatched(hipblasHandle_t handle, int n, const float *x, int incx, hipblasStride stridex, int batchCount, int *result)#
hipblasStatus_t hipblasIdamaxStridedBatched(hipblasHandle_t handle, int n, const double *x, int incx, hipblasStride stridex, int batchCount, int *result)#
hipblasStatus_t hipblasIcamaxStridedBatched(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, hipblasStride stridex, int batchCount, int *result)#
hipblasStatus_t hipblasIzamaxStridedBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, hipblasStride stridex, int batchCount, int *result)#

BLAS Level 1 API.

amaxStridedBatched finds the first index of the element of maximum magnitude of each vector x_i in a batch, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in each vector x_i

  • x[in] device pointer to the first vector x_1.

  • incx[in] [int] specifies the increment for the elements of each x_i. incx must be > 0.

  • stridex[in] [hipblasStride] specifies the pointer increment between one x_i and the next x_(i + 1).

  • batchCount[in] [int] number of instances in the batch

  • result[out] device or host pointer for storing contiguous batchCount results. return is 0 if n <= 0, incx<=0.

The amaxStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasIXamin + Batched, StridedBatched#

hipblasStatus_t hipblasIsamin(hipblasHandle_t handle, int n, const float *x, int incx, int *result)#
hipblasStatus_t hipblasIdamin(hipblasHandle_t handle, int n, const double *x, int incx, int *result)#
hipblasStatus_t hipblasIcamin(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, int *result)#
hipblasStatus_t hipblasIzamin(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, int *result)#

BLAS Level 1 API.

amin finds the first index of the element of minimum magnitude of a vector x.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : s,d,c,z

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in x.

  • x[in] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of y.

  • result[inout] device pointer or host pointer to store the amin index. return is 0.0 if n, incx<=0.

The amin function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasIsaminBatched(hipblasHandle_t handle, int n, const float *const x[], int incx, int batchCount, int *result)#
hipblasStatus_t hipblasIdaminBatched(hipblasHandle_t handle, int n, const double *const x[], int incx, int batchCount, int *result)#
hipblasStatus_t hipblasIcaminBatched(hipblasHandle_t handle, int n, const hipblasComplex *const x[], int incx, int batchCount, int *result)#
hipblasStatus_t hipblasIzaminBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *const x[], int incx, int batchCount, int *result)#

BLAS Level 1 API.

aminBatched finds the first index of the element of minimum magnitude of each vector x_i in a batch, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in each vector x_i

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment for the elements of each x_i. incx must be > 0.

  • batchCount[in] [int] number of instances in the batch, must be > 0.

  • result[out] device or host pointers to array of batchCount size for results. return is 0 if n, incx<=0.

The aminBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasIsaminStridedBatched(hipblasHandle_t handle, int n, const float *x, int incx, hipblasStride stridex, int batchCount, int *result)#
hipblasStatus_t hipblasIdaminStridedBatched(hipblasHandle_t handle, int n, const double *x, int incx, hipblasStride stridex, int batchCount, int *result)#
hipblasStatus_t hipblasIcaminStridedBatched(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, hipblasStride stridex, int batchCount, int *result)#
hipblasStatus_t hipblasIzaminStridedBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, hipblasStride stridex, int batchCount, int *result)#

BLAS Level 1 API.

aminStridedBatched finds the first index of the element of minimum magnitude of each vector x_i in a batch, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in each vector x_i

  • x[in] device pointer to the first vector x_1.

  • incx[in] [int] specifies the increment for the elements of each x_i. incx must be > 0.

  • stridex[in] [hipblasStride] specifies the pointer increment between one x_i and the next x_(i + 1)

  • batchCount[in] [int] number of instances in the batch

  • result[out] device or host pointer to array for storing contiguous batchCount results. return is 0 if n <= 0, incx<=0.

The aminStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXasum + Batched, StridedBatched#

hipblasStatus_t hipblasSasum(hipblasHandle_t handle, int n, const float *x, int incx, float *result)#
hipblasStatus_t hipblasDasum(hipblasHandle_t handle, int n, const double *x, int incx, double *result)#
hipblasStatus_t hipblasScasum(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, float *result)#
hipblasStatus_t hipblasDzasum(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, double *result)#

BLAS Level 1 API.

asum computes the sum of the magnitudes of elements of a real vector x, or the sum of magnitudes of the real and imaginary parts of elements if x is a complex vector.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : s,d,c,z

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in x and y.

  • x[in] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of x. incx must be > 0.

  • result[inout] device pointer or host pointer to store the asum product. return is 0.0 if n <= 0.

The asum function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSasumBatched(hipblasHandle_t handle, int n, const float *const x[], int incx, int batchCount, float *result)#
hipblasStatus_t hipblasDasumBatched(hipblasHandle_t handle, int n, const double *const x[], int incx, int batchCount, double *result)#
hipblasStatus_t hipblasScasumBatched(hipblasHandle_t handle, int n, const hipblasComplex *const x[], int incx, int batchCount, float *result)#
hipblasStatus_t hipblasDzasumBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *const x[], int incx, int batchCount, double *result)#

BLAS Level 1 API.

asumBatched computes the sum of the magnitudes of the elements in a batch of real vectors x_i, or the sum of magnitudes of the real and imaginary parts of elements if x_i is a complex vector, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in each vector x_i

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment for the elements of each x_i. incx must be > 0.

  • batchCount[in] [int] number of instances in the batch.

  • result[out] device array or host array of batchCount size for results. return is 0.0 if n, incx<=0.

The asumBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSasumStridedBatched(hipblasHandle_t handle, int n, const float *x, int incx, hipblasStride stridex, int batchCount, float *result)#
hipblasStatus_t hipblasDasumStridedBatched(hipblasHandle_t handle, int n, const double *x, int incx, hipblasStride stridex, int batchCount, double *result)#
hipblasStatus_t hipblasScasumStridedBatched(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, hipblasStride stridex, int batchCount, float *result)#
hipblasStatus_t hipblasDzasumStridedBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, hipblasStride stridex, int batchCount, double *result)#

BLAS Level 1 API.

asumStridedBatched computes the sum of the magnitudes of elements of a real vectors x_i, or the sum of magnitudes of the real and imaginary parts of elements if x_i is a complex vector, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in each vector x_i

  • x[in] device pointer to the first vector x_1.

  • incx[in] [int] specifies the increment for the elements of each x_i. incx must be > 0.

  • stridex[in] [hipblasStride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stride_x, however the user should take care to ensure that stride_x is of appropriate size, for a typical case this means stride_x >= n * incx.

  • batchCount[in] [int] number of instances in the batch

  • result[out] device pointer or host pointer to array for storing contiguous batchCount results. return is 0.0 if n, incx<=0.

The asumStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXaxpy + Batched, StridedBatched#

hipblasStatus_t hipblasHaxpy(hipblasHandle_t handle, int n, const hipblasHalf *alpha, const hipblasHalf *x, int incx, hipblasHalf *y, int incy)#
hipblasStatus_t hipblasSaxpy(hipblasHandle_t handle, int n, const float *alpha, const float *x, int incx, float *y, int incy)#
hipblasStatus_t hipblasDaxpy(hipblasHandle_t handle, int n, const double *alpha, const double *x, int incx, double *y, int incy)#
hipblasStatus_t hipblasCaxpy(hipblasHandle_t handle, int n, const hipblasComplex *alpha, const hipblasComplex *x, int incx, hipblasComplex *y, int incy)#
hipblasStatus_t hipblasZaxpy(hipblasHandle_t handle, int n, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *x, int incx, hipblasDoubleComplex *y, int incy)#

BLAS Level 1 API.

axpy computes constant alpha multiplied by vector x, plus vector y

y := alpha * x + y
  • Supported precisions in rocBLAS : h,s,d,c,z

  • Supported precisions in cuBLAS : s,d,c,z

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in x and y.

  • alpha[in] device pointer or host pointer to specify the scalar alpha.

  • x[in] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of x.

  • y[out] device pointer storing vector y.

  • incy[inout] [int] specifies the increment for the elements of y.

The axpy function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasHaxpyBatched(hipblasHandle_t handle, int n, const hipblasHalf *alpha, const hipblasHalf *const x[], int incx, hipblasHalf *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasSaxpyBatched(hipblasHandle_t handle, int n, const float *alpha, const float *const x[], int incx, float *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasDaxpyBatched(hipblasHandle_t handle, int n, const double *alpha, const double *const x[], int incx, double *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasCaxpyBatched(hipblasHandle_t handle, int n, const hipblasComplex *alpha, const hipblasComplex *const x[], int incx, hipblasComplex *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasZaxpyBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *const x[], int incx, hipblasDoubleComplex *const y[], int incy, int batchCount)#

BLAS Level 1 API.

axpyBatched compute y := alpha * x + y over a set of batched vectors.

  • Supported precisions in rocBLAS : h,s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in x and y.

  • alpha[in] specifies the scalar alpha.

  • x[in] pointer storing vector x on the GPU.

  • incx[in] [int] specifies the increment for the elements of x.

  • y[out] pointer storing vector y on the GPU.

  • incy[inout] [int] specifies the increment for the elements of y.

  • batchCount[in] [int] number of instances in the batch

The axpyBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasHaxpyStridedBatched(hipblasHandle_t handle, int n, const hipblasHalf *alpha, const hipblasHalf *x, int incx, hipblasStride stridex, hipblasHalf *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasSaxpyStridedBatched(hipblasHandle_t handle, int n, const float *alpha, const float *x, int incx, hipblasStride stridex, float *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasDaxpyStridedBatched(hipblasHandle_t handle, int n, const double *alpha, const double *x, int incx, hipblasStride stridex, double *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasCaxpyStridedBatched(hipblasHandle_t handle, int n, const hipblasComplex *alpha, const hipblasComplex *x, int incx, hipblasStride stridex, hipblasComplex *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasZaxpyStridedBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *x, int incx, hipblasStride stridex, hipblasDoubleComplex *y, int incy, hipblasStride stridey, int batchCount)#

BLAS Level 1 API.

axpyStridedBatched compute y := alpha * x + y over a set of strided batched vectors.

  • Supported precisions in rocBLAS : h,s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int]

  • alpha[in] specifies the scalar alpha.

  • x[in] pointer storing vector x on the GPU.

  • incx[in] [int] specifies the increment for the elements of x.

  • stridex[in] [hipblasStride] specifies the increment between vectors of x.

  • y[out] pointer storing vector y on the GPU.

  • incy[inout] [int] specifies the increment for the elements of y.

  • stridey[in] [hipblasStride] specifies the increment between vectors of y.

  • batchCount[in] [int] number of instances in the batch

The axpyStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXcopy + Batched, StridedBatched#

hipblasStatus_t hipblasScopy(hipblasHandle_t handle, int n, const float *x, int incx, float *y, int incy)#
hipblasStatus_t hipblasDcopy(hipblasHandle_t handle, int n, const double *x, int incx, double *y, int incy)#
hipblasStatus_t hipblasCcopy(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, hipblasComplex *y, int incy)#
hipblasStatus_t hipblasZcopy(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, hipblasDoubleComplex *y, int incy)#

BLAS Level 1 API.

copy copies each element x[i] into y[i], for i = 1 , … , n

y := x,
  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : s,d,c,z

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in x to be copied to y.

  • x[in] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of x.

  • y[out] device pointer storing vector y.

  • incy[in] [int] specifies the increment for the elements of y.

The copy function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasScopyBatched(hipblasHandle_t handle, int n, const float *const x[], int incx, float *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasDcopyBatched(hipblasHandle_t handle, int n, const double *const x[], int incx, double *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasCcopyBatched(hipblasHandle_t handle, int n, const hipblasComplex *const x[], int incx, hipblasComplex *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasZcopyBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *const x[], int incx, hipblasDoubleComplex *const y[], int incy, int batchCount)#

BLAS Level 1 API.

copyBatched copies each element x_i[j] into y_i[j], for j = 1 , … , n; i = 1 , … , batchCount

y_i := x_i,
where (x_i, y_i) is the i-th instance of the batch. x_i and y_i are vectors.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in each x_i to be copied to y_i.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment for the elements of each vector x_i.

  • y[out] device array of device pointers storing each vector y_i.

  • incy[in] [int] specifies the increment for the elements of each vector y_i.

  • batchCount[in] [int] number of instances in the batch

The copyBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasScopyStridedBatched(hipblasHandle_t handle, int n, const float *x, int incx, hipblasStride stridex, float *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasDcopyStridedBatched(hipblasHandle_t handle, int n, const double *x, int incx, hipblasStride stridex, double *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasCcopyStridedBatched(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, hipblasStride stridex, hipblasComplex *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasZcopyStridedBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, hipblasStride stridex, hipblasDoubleComplex *y, int incy, hipblasStride stridey, int batchCount)#

BLAS Level 1 API.

copyStridedBatched copies each element x_i[j] into y_i[j], for j = 1 , … , n; i = 1 , … , batchCount

y_i := x_i,
where (x_i, y_i) is the i-th instance of the batch. x_i and y_i are vectors.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in each x_i to be copied to y_i.

  • x[in] device pointer to the first vector (x_1) in the batch.

  • incx[in] [int] specifies the increments for the elements of vectors x_i.

  • stridex[in] [hipblasStride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stride_x, however the user should take care to ensure that stride_x is of appropriate size, for a typical case this means stride_x >= n * incx.

  • y[out] device pointer to the first vector (y_1) in the batch.

  • incy[in] [int] specifies the increment for the elements of vectors y_i.

  • stridey[in] [hipblasStride] stride from the start of one vector (y_i) and the next one (y_i+1). There are no restrictions placed on stride_y, however the user should take care to ensure that stride_y is of appropriate size, for a typical case this means stride_y >= n * incy. stridey should be non zero.

  • batchCount[in] [int] number of instances in the batch

The copyStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXdot + Batched, StridedBatched#

hipblasStatus_t hipblasHdot(hipblasHandle_t handle, int n, const hipblasHalf *x, int incx, const hipblasHalf *y, int incy, hipblasHalf *result)#
hipblasStatus_t hipblasBfdot(hipblasHandle_t handle, int n, const hipblasBfloat16 *x, int incx, const hipblasBfloat16 *y, int incy, hipblasBfloat16 *result)#
hipblasStatus_t hipblasSdot(hipblasHandle_t handle, int n, const float *x, int incx, const float *y, int incy, float *result)#
hipblasStatus_t hipblasDdot(hipblasHandle_t handle, int n, const double *x, int incx, const double *y, int incy, double *result)#
hipblasStatus_t hipblasCdotc(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, const hipblasComplex *y, int incy, hipblasComplex *result)#
hipblasStatus_t hipblasCdotu(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, const hipblasComplex *y, int incy, hipblasComplex *result)#
hipblasStatus_t hipblasZdotc(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, const hipblasDoubleComplex *y, int incy, hipblasDoubleComplex *result)#
hipblasStatus_t hipblasZdotu(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, const hipblasDoubleComplex *y, int incy, hipblasDoubleComplex *result)#

BLAS Level 1 API.

dot(u) performs the dot product of vectors x and y

result = x * y;
dotc performs the dot product of the conjugate of complex vector x and complex vector y
result = conjugate (x) * y;
  • Supported precisions in rocBLAS : h,bf,s,d,c,z

  • Supported precisions in cuBLAS : s,d,c,z

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in x and y.

  • x[in] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of y.

  • y[in] device pointer storing vector y.

  • incy[in] [int] specifies the increment for the elements of y.

  • result[inout] device pointer or host pointer to store the dot product. return is 0.0 if n <= 0.

The dot function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasHdotBatched(hipblasHandle_t handle, int n, const hipblasHalf *const x[], int incx, const hipblasHalf *const y[], int incy, int batchCount, hipblasHalf *result)#
hipblasStatus_t hipblasBfdotBatched(hipblasHandle_t handle, int n, const hipblasBfloat16 *const x[], int incx, const hipblasBfloat16 *const y[], int incy, int batchCount, hipblasBfloat16 *result)#
hipblasStatus_t hipblasSdotBatched(hipblasHandle_t handle, int n, const float *const x[], int incx, const float *const y[], int incy, int batchCount, float *result)#
hipblasStatus_t hipblasDdotBatched(hipblasHandle_t handle, int n, const double *const x[], int incx, const double *const y[], int incy, int batchCount, double *result)#
hipblasStatus_t hipblasCdotcBatched(hipblasHandle_t handle, int n, const hipblasComplex *const x[], int incx, const hipblasComplex *const y[], int incy, int batchCount, hipblasComplex *result)#
hipblasStatus_t hipblasCdotuBatched(hipblasHandle_t handle, int n, const hipblasComplex *const x[], int incx, const hipblasComplex *const y[], int incy, int batchCount, hipblasComplex *result)#
hipblasStatus_t hipblasZdotcBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *const x[], int incx, const hipblasDoubleComplex *const y[], int incy, int batchCount, hipblasDoubleComplex *result)#
hipblasStatus_t hipblasZdotuBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *const x[], int incx, const hipblasDoubleComplex *const y[], int incy, int batchCount, hipblasDoubleComplex *result)#

BLAS Level 1 API.

dotBatched(u) performs a batch of dot products of vectors x and y

result_i = x_i * y_i;
dotcBatched performs a batch of dot products of the conjugate of complex vector x and complex vector y
result_i = conjugate (x_i) * y_i;
where (x_i, y_i) is the i-th instance of the batch. x_i and y_i are vectors, for i = 1, …, batchCount

  • Supported precisions in rocBLAS : h,bf,s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in each x_i and y_i.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment for the elements of each x_i.

  • y[in] device array of device pointers storing each vector y_i.

  • incy[in] [int] specifies the increment for the elements of each y_i.

  • batchCount[in] [int] number of instances in the batch

  • result[inout] device array or host array of batchCount size to store the dot products of each batch. return 0.0 for each element if n <= 0.

The dotBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasHdotStridedBatched(hipblasHandle_t handle, int n, const hipblasHalf *x, int incx, hipblasStride stridex, const hipblasHalf *y, int incy, hipblasStride stridey, int batchCount, hipblasHalf *result)#
hipblasStatus_t hipblasBfdotStridedBatched(hipblasHandle_t handle, int n, const hipblasBfloat16 *x, int incx, hipblasStride stridex, const hipblasBfloat16 *y, int incy, hipblasStride stridey, int batchCount, hipblasBfloat16 *result)#
hipblasStatus_t hipblasSdotStridedBatched(hipblasHandle_t handle, int n, const float *x, int incx, hipblasStride stridex, const float *y, int incy, hipblasStride stridey, int batchCount, float *result)#
hipblasStatus_t hipblasDdotStridedBatched(hipblasHandle_t handle, int n, const double *x, int incx, hipblasStride stridex, const double *y, int incy, hipblasStride stridey, int batchCount, double *result)#
hipblasStatus_t hipblasCdotcStridedBatched(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, hipblasStride stridex, const hipblasComplex *y, int incy, hipblasStride stridey, int batchCount, hipblasComplex *result)#
hipblasStatus_t hipblasCdotuStridedBatched(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, hipblasStride stridex, const hipblasComplex *y, int incy, hipblasStride stridey, int batchCount, hipblasComplex *result)#
hipblasStatus_t hipblasZdotcStridedBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, hipblasStride stridex, const hipblasDoubleComplex *y, int incy, hipblasStride stridey, int batchCount, hipblasDoubleComplex *result)#
hipblasStatus_t hipblasZdotuStridedBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, hipblasStride stridex, const hipblasDoubleComplex *y, int incy, hipblasStride stridey, int batchCount, hipblasDoubleComplex *result)#

BLAS Level 1 API.

dotStridedBatched(u) performs a batch of dot products of vectors x and y

result_i = x_i * y_i;
dotcStridedBatched performs a batch of dot products of the conjugate of complex vector x and complex vector y
result_i = conjugate (x_i) * y_i;
where (x_i, y_i) is the i-th instance of the batch. x_i and y_i are vectors, for i = 1, …, batchCount

  • Supported precisions in rocBLAS : h,bf,s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in each x_i and y_i.

  • x[in] device pointer to the first vector (x_1) in the batch.

  • incx[in] [int] specifies the increment for the elements of each x_i.

  • stridex[in] [hipblasStride] stride from the start of one vector (x_i) and the next one (x_i+1)

  • y[in] device pointer to the first vector (y_1) in the batch.

  • incy[in] [int] specifies the increment for the elements of each y_i.

  • stridey[in] [hipblasStride] stride from the start of one vector (y_i) and the next one (y_i+1)

  • batchCount[in] [int] number of instances in the batch

  • result[inout] device array or host array of batchCount size to store the dot products of each batch. return 0.0 for each element if n <= 0.

The dotStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXnrm2 + Batched, StridedBatched#

hipblasStatus_t hipblasSnrm2(hipblasHandle_t handle, int n, const float *x, int incx, float *result)#
hipblasStatus_t hipblasDnrm2(hipblasHandle_t handle, int n, const double *x, int incx, double *result)#
hipblasStatus_t hipblasScnrm2(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, float *result)#
hipblasStatus_t hipblasDznrm2(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, double *result)#

BLAS Level 1 API.

nrm2 computes the euclidean norm of a real or complex vector

      result := sqrt( x'*x ) for real vectors
      result := sqrt( x**H*x ) for complex vectors
  • Supported precisions in rocBLAS : s,d,c,z,sc,dz

  • Supported precisions in cuBLAS : s,d,sc,dz

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in x.

  • x[in] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of y.

  • result[inout] device pointer or host pointer to store the nrm2 product. return is 0.0 if n, incx<=0.

The nrm2 function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSnrm2Batched(hipblasHandle_t handle, int n, const float *const x[], int incx, int batchCount, float *result)#
hipblasStatus_t hipblasDnrm2Batched(hipblasHandle_t handle, int n, const double *const x[], int incx, int batchCount, double *result)#
hipblasStatus_t hipblasScnrm2Batched(hipblasHandle_t handle, int n, const hipblasComplex *const x[], int incx, int batchCount, float *result)#
hipblasStatus_t hipblasDznrm2Batched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *const x[], int incx, int batchCount, double *result)#

BLAS Level 1 API.

nrm2Batched computes the euclidean norm over a batch of real or complex vectors

      result := sqrt( x_i'*x_i ) for real vectors x, for i = 1, ..., batchCount
      result := sqrt( x_i**H*x_i ) for complex vectors x, for i = 1, ..., batchCount
  • Supported precisions in rocBLAS : s,d,c,z,sc,dz

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in each x_i.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment for the elements of each x_i. incx must be > 0.

  • batchCount[in] [int] number of instances in the batch

  • result[out] device pointer or host pointer to array of batchCount size for nrm2 results. return is 0.0 for each element if n <= 0, incx<=0.

The nrm2Batched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSnrm2StridedBatched(hipblasHandle_t handle, int n, const float *x, int incx, hipblasStride stridex, int batchCount, float *result)#
hipblasStatus_t hipblasDnrm2StridedBatched(hipblasHandle_t handle, int n, const double *x, int incx, hipblasStride stridex, int batchCount, double *result)#
hipblasStatus_t hipblasScnrm2StridedBatched(hipblasHandle_t handle, int n, const hipblasComplex *x, int incx, hipblasStride stridex, int batchCount, float *result)#
hipblasStatus_t hipblasDznrm2StridedBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *x, int incx, hipblasStride stridex, int batchCount, double *result)#

BLAS Level 1 API.

nrm2StridedBatched computes the euclidean norm over a batch of real or complex vectors

      := sqrt( x_i'*x_i ) for real vectors x, for i = 1, ..., batchCount
      := sqrt( x_i**H*x_i ) for complex vectors, for i = 1, ..., batchCount
  • Supported precisions in rocBLAS : s,d,c,z,sc,dz

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in each x_i.

  • x[in] device pointer to the first vector x_1.

  • incx[in] [int] specifies the increment for the elements of each x_i. incx must be > 0.

  • stridex[in] [hipblasStride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stride_x, however the user should take care to ensure that stride_x is of appropriate size, for a typical case this means stride_x >= n * incx.

  • batchCount[in] [int] number of instances in the batch

  • result[out] device pointer or host pointer to array for storing contiguous batchCount results. return is 0.0 for each element if n <= 0, incx<=0.

The nrm2StridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXrot + Batched, StridedBatched#

hipblasStatus_t hipblasSrot(hipblasHandle_t handle, int n, float *x, int incx, float *y, int incy, const float *c, const float *s)#
hipblasStatus_t hipblasDrot(hipblasHandle_t handle, int n, double *x, int incx, double *y, int incy, const double *c, const double *s)#
hipblasStatus_t hipblasCrot(hipblasHandle_t handle, int n, hipblasComplex *x, int incx, hipblasComplex *y, int incy, const float *c, const hipblasComplex *s)#
hipblasStatus_t hipblasCsrot(hipblasHandle_t handle, int n, hipblasComplex *x, int incx, hipblasComplex *y, int incy, const float *c, const float *s)#
hipblasStatus_t hipblasZrot(hipblasHandle_t handle, int n, hipblasDoubleComplex *x, int incx, hipblasDoubleComplex *y, int incy, const double *c, const hipblasDoubleComplex *s)#
hipblasStatus_t hipblasZdrot(hipblasHandle_t handle, int n, hipblasDoubleComplex *x, int incx, hipblasDoubleComplex *y, int incy, const double *c, const double *s)#

BLAS Level 1 API.

rot applies the Givens rotation matrix defined by c=cos(alpha) and s=sin(alpha) to vectors x and y. Scalars c and s may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode.

  • Supported precisions in rocBLAS : s,d,c,z,sc,dz

  • Supported precisions in cuBLAS : s,d,c,z,cs,zd

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in the x and y vectors.

  • x[inout] device pointer storing vector x.

  • incx[in] [int] specifies the increment between elements of x.

  • y[inout] device pointer storing vector y.

  • incy[in] [int] specifies the increment between elements of y.

  • c[in] device pointer or host pointer storing scalar cosine component of the rotation matrix.

  • s[in] device pointer or host pointer storing scalar sine component of the rotation matrix.

The rot function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSrotBatched(hipblasHandle_t handle, int n, float *const x[], int incx, float *const y[], int incy, const float *c, const float *s, int batchCount)#
hipblasStatus_t hipblasDrotBatched(hipblasHandle_t handle, int n, double *const x[], int incx, double *const y[], int incy, const double *c, const double *s, int batchCount)#
hipblasStatus_t hipblasCrotBatched(hipblasHandle_t handle, int n, hipblasComplex *const x[], int incx, hipblasComplex *const y[], int incy, const float *c, const hipblasComplex *s, int batchCount)#
hipblasStatus_t hipblasCsrotBatched(hipblasHandle_t handle, int n, hipblasComplex *const x[], int incx, hipblasComplex *const y[], int incy, const float *c, const float *s, int batchCount)#
hipblasStatus_t hipblasZrotBatched(hipblasHandle_t handle, int n, hipblasDoubleComplex *const x[], int incx, hipblasDoubleComplex *const y[], int incy, const double *c, const hipblasDoubleComplex *s, int batchCount)#
hipblasStatus_t hipblasZdrotBatched(hipblasHandle_t handle, int n, hipblasDoubleComplex *const x[], int incx, hipblasDoubleComplex *const y[], int incy, const double *c, const double *s, int batchCount)#

BLAS Level 1 API.

rotBatched applies the Givens rotation matrix defined by c=cos(alpha) and s=sin(alpha) to batched vectors x_i and y_i, for i = 1, …, batchCount. Scalars c and s may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode.

  • Supported precisions in rocBLAS : s,d,sc,dz

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in each x_i and y_i vectors.

  • x[inout] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment between elements of each x_i.

  • y[inout] device array of device pointers storing each vector y_i.

  • incy[in] [int] specifies the increment between elements of each y_i.

  • c[in] device pointer or host pointer to scalar cosine component of the rotation matrix.

  • s[in] device pointer or host pointer to scalar sine component of the rotation matrix.

  • batchCount[in] [int] the number of x and y arrays, i.e. the number of batches.

The rotBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSrotStridedBatched(hipblasHandle_t handle, int n, float *x, int incx, hipblasStride stridex, float *y, int incy, hipblasStride stridey, const float *c, const float *s, int batchCount)#
hipblasStatus_t hipblasDrotStridedBatched(hipblasHandle_t handle, int n, double *x, int incx, hipblasStride stridex, double *y, int incy, hipblasStride stridey, const double *c, const double *s, int batchCount)#
hipblasStatus_t hipblasCrotStridedBatched(hipblasHandle_t handle, int n, hipblasComplex *x, int incx, hipblasStride stridex, hipblasComplex *y, int incy, hipblasStride stridey, const float *c, const hipblasComplex *s, int batchCount)#
hipblasStatus_t hipblasCsrotStridedBatched(hipblasHandle_t handle, int n, hipblasComplex *x, int incx, hipblasStride stridex, hipblasComplex *y, int incy, hipblasStride stridey, const float *c, const float *s, int batchCount)#
hipblasStatus_t hipblasZrotStridedBatched(hipblasHandle_t handle, int n, hipblasDoubleComplex *x, int incx, hipblasStride stridex, hipblasDoubleComplex *y, int incy, hipblasStride stridey, const double *c, const hipblasDoubleComplex *s, int batchCount)#
hipblasStatus_t hipblasZdrotStridedBatched(hipblasHandle_t handle, int n, hipblasDoubleComplex *x, int incx, hipblasStride stridex, hipblasDoubleComplex *y, int incy, hipblasStride stridey, const double *c, const double *s, int batchCount)#

BLAS Level 1 API.

rotStridedBatched applies the Givens rotation matrix defined by c=cos(alpha) and s=sin(alpha) to strided batched vectors x_i and y_i, for i = 1, …, batchCount. Scalars c and s may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode.

  • Supported precisions in rocBLAS : s,d,sc,dz

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in each x_i and y_i vectors.

  • x[inout] device pointer to the first vector x_1.

  • incx[in] [int] specifies the increment between elements of each x_i.

  • stridex[in] [hipblasStride] specifies the increment from the beginning of x_i to the beginning of x_(i+1)

  • y[inout] device pointer to the first vector y_1.

  • incy[in] [int] specifies the increment between elements of each y_i.

  • stridey[in] [hipblasStride] specifies the increment from the beginning of y_i to the beginning of y_(i+1)

  • c[in] device pointer or host pointer to scalar cosine component of the rotation matrix.

  • s[in] device pointer or host pointer to scalar sine component of the rotation matrix.

  • batchCount[in] [int] the number of x and y arrays, i.e. the number of batches.

The rotStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXrotg + Batched, StridedBatched#

hipblasStatus_t hipblasSrotg(hipblasHandle_t handle, float *a, float *b, float *c, float *s)#
hipblasStatus_t hipblasDrotg(hipblasHandle_t handle, double *a, double *b, double *c, double *s)#
hipblasStatus_t hipblasCrotg(hipblasHandle_t handle, hipblasComplex *a, hipblasComplex *b, float *c, hipblasComplex *s)#
hipblasStatus_t hipblasZrotg(hipblasHandle_t handle, hipblasDoubleComplex *a, hipblasDoubleComplex *b, double *c, hipblasDoubleComplex *s)#

BLAS Level 1 API.

rotg creates the Givens rotation matrix for the vector (a b). Scalars c and s and arrays a and b may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode. If the pointer mode is set to HIPBLAS_POINTER_MODE_HOST, this function blocks the CPU until the GPU has finished and the results are available in host memory. If the pointer mode is set to HIPBLAS_POINTER_MODE_DEVICE, this function returns immediately and synchronization is required to read the results.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : s,d,c,z

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • a[inout] device pointer or host pointer to input vector element, overwritten with r.

  • b[inout] device pointer or host pointer to input vector element, overwritten with z.

  • c[inout] device pointer or host pointer to cosine element of Givens rotation.

  • s[inout] device pointer or host pointer sine element of Givens rotation.

The rotg function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSrotgBatched(hipblasHandle_t handle, float *const a[], float *const b[], float *const c[], float *const s[], int batchCount)#
hipblasStatus_t hipblasDrotgBatched(hipblasHandle_t handle, double *const a[], double *const b[], double *const c[], double *const s[], int batchCount)#
hipblasStatus_t hipblasCrotgBatched(hipblasHandle_t handle, hipblasComplex *const a[], hipblasComplex *const b[], float *const c[], hipblasComplex *const s[], int batchCount)#
hipblasStatus_t hipblasZrotgBatched(hipblasHandle_t handle, hipblasDoubleComplex *const a[], hipblasDoubleComplex *const b[], double *const c[], hipblasDoubleComplex *const s[], int batchCount)#

BLAS Level 1 API.

rotgBatched creates the Givens rotation matrix for the batched vectors (a_i b_i), for i = 1, …, batchCount. a, b, c, and s may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode. If the pointer mode is set to HIPBLAS_POINTER_MODE_HOST, this function blocks the CPU until the GPU has finished and the results are available in host memory. If the pointer mode is set to HIPBLAS_POINTER_MODE_DEVICE, this function returns immediately and synchronization is required to read the results.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • a[inout] device array of device pointers storing each single input vector element a_i, overwritten with r_i.

  • b[inout] device array of device pointers storing each single input vector element b_i, overwritten with z_i.

  • c[inout] device array of device pointers storing each cosine element of Givens rotation for the batch.

  • s[inout] device array of device pointers storing each sine element of Givens rotation for the batch.

  • batchCount[in] [int] number of batches (length of arrays a, b, c, and s).

The rotgBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSrotgStridedBatched(hipblasHandle_t handle, float *a, hipblasStride stridea, float *b, hipblasStride strideb, float *c, hipblasStride stridec, float *s, hipblasStride strides, int batchCount)#
hipblasStatus_t hipblasDrotgStridedBatched(hipblasHandle_t handle, double *a, hipblasStride stridea, double *b, hipblasStride strideb, double *c, hipblasStride stridec, double *s, hipblasStride strides, int batchCount)#
hipblasStatus_t hipblasCrotgStridedBatched(hipblasHandle_t handle, hipblasComplex *a, hipblasStride stridea, hipblasComplex *b, hipblasStride strideb, float *c, hipblasStride stridec, hipblasComplex *s, hipblasStride strides, int batchCount)#
hipblasStatus_t hipblasZrotgStridedBatched(hipblasHandle_t handle, hipblasDoubleComplex *a, hipblasStride stridea, hipblasDoubleComplex *b, hipblasStride strideb, double *c, hipblasStride stridec, hipblasDoubleComplex *s, hipblasStride strides, int batchCount)#

BLAS Level 1 API.

rotgStridedBatched creates the Givens rotation matrix for the strided batched vectors (a_i b_i), for i = 1, …, batchCount. a, b, c, and s may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode. If the pointer mode is set to HIPBLAS_POINTER_MODE_HOST, this function blocks the CPU until the GPU has finished and the results are available in host memory. If the pointer mode is set to HIPBLAS_POINTER_MODE_HOST, this function returns immediately and synchronization is required to read the results.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • a[inout] device strided_batched pointer or host strided_batched pointer to first single input vector element a_1, overwritten with r.

  • stridea[in] [hipblasStride] distance between elements of a in batch (distance between a_i and a_(i + 1))

  • b[inout] device strided_batched pointer or host strided_batched pointer to first single input vector element b_1, overwritten with z.

  • strideb[in] [hipblasStride] distance between elements of b in batch (distance between b_i and b_(i + 1))

  • c[inout] device strided_batched pointer or host strided_batched pointer to first cosine element of Givens rotations c_1.

  • stridec[in] [hipblasStride] distance between elements of c in batch (distance between c_i and c_(i + 1))

  • s[inout] device strided_batched pointer or host strided_batched pointer to sine element of Givens rotations s_1.

  • strides[in] [hipblasStride] distance between elements of s in batch (distance between s_i and s_(i + 1))

  • batchCount[in] [int] number of batches (length of arrays a, b, c, and s).

The rotgStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXrotm + Batched, StridedBatched#

hipblasStatus_t hipblasSrotm(hipblasHandle_t handle, int n, float *x, int incx, float *y, int incy, const float *param)#
hipblasStatus_t hipblasDrotm(hipblasHandle_t handle, int n, double *x, int incx, double *y, int incy, const double *param)#

BLAS Level 1 API.

rotm applies the modified Givens rotation matrix defined by param to vectors x and y.

  • Supported precisions in rocBLAS : s,d

  • Supported precisions in cuBLAS : s,d

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in the x and y vectors.

  • x[inout] device pointer storing vector x.

  • incx[in] [int] specifies the increment between elements of x.

  • y[inout] device pointer storing vector y.

  • incy[in] [int] specifies the increment between elements of y.

  • param[in] device vector or host vector of 5 elements defining the rotation. param[0] = flag param[1] = H11 param[2] = H21 param[3] = H12 param[4] = H22 The flag parameter defines the form of H: flag = -1 => H = ( H11 H12 H21 H22 ) flag = 0 => H = ( 1.0 H12 H21 1.0 ) flag = 1 => H = ( H11 1.0 -1.0 H22 ) flag = -2 => H = ( 1.0 0.0 0.0 1.0 ) param may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode.

The rotm function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSrotmBatched(hipblasHandle_t handle, int n, float *const x[], int incx, float *const y[], int incy, const float *const param[], int batchCount)#
hipblasStatus_t hipblasDrotmBatched(hipblasHandle_t handle, int n, double *const x[], int incx, double *const y[], int incy, const double *const param[], int batchCount)#

BLAS Level 1 API.

rotmBatched applies the modified Givens rotation matrix defined by param_i to batched vectors x_i and y_i, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in the x and y vectors.

  • x[inout] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment between elements of each x_i.

  • y[inout] device array of device pointers storing each vector y_1.

  • incy[in] [int] specifies the increment between elements of each y_i.

  • param[in] device array of device vectors of 5 elements defining the rotation. param[0] = flag param[1] = H11 param[2] = H21 param[3] = H12 param[4] = H22 The flag parameter defines the form of H: flag = -1 => H = ( H11 H12 H21 H22 ) flag = 0 => H = ( 1.0 H12 H21 1.0 ) flag = 1 => H = ( H11 1.0 -1.0 H22 ) flag = -2 => H = ( 1.0 0.0 0.0 1.0 ) param may ONLY be stored on the device for the batched version of this function.

  • batchCount[in] [int] the number of x and y arrays, i.e. the number of batches.

The rotmBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSrotmStridedBatched(hipblasHandle_t handle, int n, float *x, int incx, hipblasStride stridex, float *y, int incy, hipblasStride stridey, const float *param, hipblasStride strideParam, int batchCount)#
hipblasStatus_t hipblasDrotmStridedBatched(hipblasHandle_t handle, int n, double *x, int incx, hipblasStride stridex, double *y, int incy, hipblasStride stridey, const double *param, hipblasStride strideParam, int batchCount)#

BLAS Level 1 API.

rotmStridedBatched applies the modified Givens rotation matrix defined by param_i to strided batched vectors x_i and y_i, for i = 1, …, batchCount

  • Supported precisions in rocBLAS : s,d

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] number of elements in the x and y vectors.

  • x[inout] device pointer pointing to first strided batched vector x_1.

  • incx[in] [int] specifies the increment between elements of each x_i.

  • stridex[in] [hipblasStride] specifies the increment between the beginning of x_i and x_(i + 1)

  • y[inout] device pointer pointing to first strided batched vector y_1.

  • incy[in] [int] specifies the increment between elements of each y_i.

  • stridey[in] [hipblasStride] specifies the increment between the beginning of y_i and y_(i + 1)

  • param[in] device pointer pointing to first array of 5 elements defining the rotation (param_1). param[0] = flag param[1] = H11 param[2] = H21 param[3] = H12 param[4] = H22 The flag parameter defines the form of H: flag = -1 => H = ( H11 H12 H21 H22 ) flag = 0 => H = ( 1.0 H12 H21 1.0 ) flag = 1 => H = ( H11 1.0 -1.0 H22 ) flag = -2 => H = ( 1.0 0.0 0.0 1.0 ) param may ONLY be stored on the device for the strided_batched version of this function.

  • strideParam[in] [hipblasStride] specifies the increment between the beginning of param_i and param_(i + 1)

  • batchCount[in] [int] the number of x and y arrays, i.e. the number of batches.

The rotmStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXrotmg + Batched, StridedBatched#

hipblasStatus_t hipblasSrotmg(hipblasHandle_t handle, float *d1, float *d2, float *x1, const float *y1, float *param)#
hipblasStatus_t hipblasDrotmg(hipblasHandle_t handle, double *d1, double *d2, double *x1, const double *y1, double *param)#

BLAS Level 1 API.

rotmg creates the modified Givens rotation matrix for the vector (d1 * x1, d2 * y1). Parameters may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode. If the pointer mode is set to HIPBLAS_POINTER_MODE_HOST, this function blocks the CPU until the GPU has finished and the results are available in host memory. If the pointer mode is set to HIPBLAS_POINTER_MODE_DEVICE, this function returns immediately and synchronization is required to read the results.

  • Supported precisions in rocBLAS : s,d

  • Supported precisions in cuBLAS : s,d

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • d1[inout] device pointer or host pointer to input scalar that is overwritten.

  • d2[inout] device pointer or host pointer to input scalar that is overwritten.

  • x1[inout] device pointer or host pointer to input scalar that is overwritten.

  • y1[in] device pointer or host pointer to input scalar.

  • param[out] device vector or host vector of 5 elements defining the rotation. param[0] = flag param[1] = H11 param[2] = H21 param[3] = H12 param[4] = H22 The flag parameter defines the form of H: flag = -1 => H = ( H11 H12 H21 H22 ) flag = 0 => H = ( 1.0 H12 H21 1.0 ) flag = 1 => H = ( H11 1.0 -1.0 H22 ) flag = -2 => H = ( 1.0 0.0 0.0 1.0 ) param may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode.

The rotmg function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSrotmgBatched(hipblasHandle_t handle, float *const d1[], float *const d2[], float *const x1[], const float *const y1[], float *const param[], int batchCount)#
hipblasStatus_t hipblasDrotmgBatched(hipblasHandle_t handle, double *const d1[], double *const d2[], double *const x1[], const double *const y1[], double *const param[], int batchCount)#

BLAS Level 1 API.

rotmgBatched creates the modified Givens rotation matrix for the batched vectors (d1_i * x1_i, d2_i * y1_i), for i = 1, …, batchCount. Parameters may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode. If the pointer mode is set to HIPBLAS_POINTER_MODE_HOST, this function blocks the CPU until the GPU has finished and the results are available in host memory. If the pointer mode is set to HIPBLAS_POINTER_MODE_DEVICE, this function returns immediately and synchronization is required to read the results.

  • Supported precisions in rocBLAS : s,d

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • d1[inout] device batched array or host batched array of input scalars that is overwritten.

  • d2[inout] device batched array or host batched array of input scalars that is overwritten.

  • x1[inout] device batched array or host batched array of input scalars that is overwritten.

  • y1[in] device batched array or host batched array of input scalars.

  • param[out] device batched array or host batched array of vectors of 5 elements defining the rotation. param[0] = flag param[1] = H11 param[2] = H21 param[3] = H12 param[4] = H22 The flag parameter defines the form of H: flag = -1 => H = ( H11 H12 H21 H22 ) flag = 0 => H = ( 1.0 H12 H21 1.0 ) flag = 1 => H = ( H11 1.0 -1.0 H22 ) flag = -2 => H = ( 1.0 0.0 0.0 1.0 ) param may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode.

  • batchCount[in] [int] the number of instances in the batch.

The rotmgBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSrotmgStridedBatched(hipblasHandle_t handle, float *d1, hipblasStride strided1, float *d2, hipblasStride strided2, float *x1, hipblasStride stridex1, const float *y1, hipblasStride stridey1, float *param, hipblasStride strideParam, int batchCount)#
hipblasStatus_t hipblasDrotmgStridedBatched(hipblasHandle_t handle, double *d1, hipblasStride strided1, double *d2, hipblasStride strided2, double *x1, hipblasStride stridex1, const double *y1, hipblasStride stridey1, double *param, hipblasStride strideParam, int batchCount)#

BLAS Level 1 API.

rotmgStridedBatched creates the modified Givens rotation matrix for the strided batched vectors (d1_i * x1_i, d2_i * y1_i), for i = 1, …, batchCount. Parameters may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode. If the pointer mode is set to HIPBLAS_POINTER_MODE_HOST, this function blocks the CPU until the GPU has finished and the results are available in host memory. If the pointer mode is set to HIPBLAS_POINTER_MODE_DEVICE, this function returns immediately and synchronization is required to read the results.

  • Supported precisions in rocBLAS : s,d

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • d1[inout] device strided_batched array or host strided_batched array of input scalars that is overwritten.

  • strided1[in] [hipblasStride] specifies the increment between the beginning of d1_i and d1_(i+1)

  • d2[inout] device strided_batched array or host strided_batched array of input scalars that is overwritten.

  • strided2[in] [hipblasStride] specifies the increment between the beginning of d2_i and d2_(i+1)

  • x1[inout] device strided_batched array or host strided_batched array of input scalars that is overwritten.

  • stridex1[in] [hipblasStride] specifies the increment between the beginning of x1_i and x1_(i+1)

  • y1[in] device strided_batched array or host strided_batched array of input scalars.

  • stridey1[in] [hipblasStride] specifies the increment between the beginning of y1_i and y1_(i+1)

  • param[out] device stridedBatched array or host stridedBatched array of vectors of 5 elements defining the rotation. param[0] = flag param[1] = H11 param[2] = H21 param[3] = H12 param[4] = H22 The flag parameter defines the form of H: flag = -1 => H = ( H11 H12 H21 H22 ) flag = 0 => H = ( 1.0 H12 H21 1.0 ) flag = 1 => H = ( H11 1.0 -1.0 H22 ) flag = -2 => H = ( 1.0 0.0 0.0 1.0 ) param may be stored in either host or device memory, location is specified by calling hipblasSetPointerMode.

  • strideParam[in] [hipblasStride] specifies the increment between the beginning of param_i and param_(i + 1)

  • batchCount[in] [int] the number of instances in the batch.

The rotmgStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXscal + Batched, StridedBatched#

hipblasStatus_t hipblasSscal(hipblasHandle_t handle, int n, const float *alpha, float *x, int incx)#
hipblasStatus_t hipblasDscal(hipblasHandle_t handle, int n, const double *alpha, double *x, int incx)#
hipblasStatus_t hipblasCscal(hipblasHandle_t handle, int n, const hipblasComplex *alpha, hipblasComplex *x, int incx)#
hipblasStatus_t hipblasCsscal(hipblasHandle_t handle, int n, const float *alpha, hipblasComplex *x, int incx)#
hipblasStatus_t hipblasZscal(hipblasHandle_t handle, int n, const hipblasDoubleComplex *alpha, hipblasDoubleComplex *x, int incx)#
hipblasStatus_t hipblasZdscal(hipblasHandle_t handle, int n, const double *alpha, hipblasDoubleComplex *x, int incx)#

BLAS Level 1 API.

scal scales each element of vector x with scalar alpha.

x := alpha * x
  • Supported precisions in rocBLAS : s,d,c,z,cs,zd

  • Supported precisions in cuBLAS : s,d,c,z,cs,zd

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in x.

  • alpha[in] device pointer or host pointer for the scalar alpha.

  • x[inout] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of x.

The scal function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSscalBatched(hipblasHandle_t handle, int n, const float *alpha, float *const x[], int incx, int batchCount)#
hipblasStatus_t hipblasDscalBatched(hipblasHandle_t handle, int n, const double *alpha, double *const x[], int incx, int batchCount)#
hipblasStatus_t hipblasCscalBatched(hipblasHandle_t handle, int n, const hipblasComplex *alpha, hipblasComplex *const x[], int incx, int batchCount)#
hipblasStatus_t hipblasZscalBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *alpha, hipblasDoubleComplex *const x[], int incx, int batchCount)#
hipblasStatus_t hipblasCsscalBatched(hipblasHandle_t handle, int n, const float *alpha, hipblasComplex *const x[], int incx, int batchCount)#
hipblasStatus_t hipblasZdscalBatched(hipblasHandle_t handle, int n, const double *alpha, hipblasDoubleComplex *const x[], int incx, int batchCount)#

BLAS Level 1 API.

scalBatched scales each element of vector x_i with scalar alpha, for i = 1, … , batchCount.

 x_i := alpha * x_i
where (x_i) is the i-th instance of the batch.

  • Supported precisions in rocBLAS : s,d,c,z,cs,zd

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in each x_i.

  • alpha[in] host pointer or device pointer for the scalar alpha.

  • x[inout] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment for the elements of each x_i.

  • batchCount[in] [int] specifies the number of batches in x.

The scalBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSscalStridedBatched(hipblasHandle_t handle, int n, const float *alpha, float *x, int incx, hipblasStride stridex, int batchCount)#
hipblasStatus_t hipblasDscalStridedBatched(hipblasHandle_t handle, int n, const double *alpha, double *x, int incx, hipblasStride stridex, int batchCount)#
hipblasStatus_t hipblasCscalStridedBatched(hipblasHandle_t handle, int n, const hipblasComplex *alpha, hipblasComplex *x, int incx, hipblasStride stridex, int batchCount)#
hipblasStatus_t hipblasZscalStridedBatched(hipblasHandle_t handle, int n, const hipblasDoubleComplex *alpha, hipblasDoubleComplex *x, int incx, hipblasStride stridex, int batchCount)#
hipblasStatus_t hipblasCsscalStridedBatched(hipblasHandle_t handle, int n, const float *alpha, hipblasComplex *x, int incx, hipblasStride stridex, int batchCount)#
hipblasStatus_t hipblasZdscalStridedBatched(hipblasHandle_t handle, int n, const double *alpha, hipblasDoubleComplex *x, int incx, hipblasStride stridex, int batchCount)#

BLAS Level 1 API.

scalStridedBatched scales each element of vector x_i with scalar alpha, for i = 1, … , batchCount.

 x_i := alpha * x_i ,
where (x_i) is the i-th instance of the batch.

  • Supported precisions in rocBLAS : s,d,c,z,cs,zd

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in each x_i.

  • alpha[in] host pointer or device pointer for the scalar alpha.

  • x[inout] device pointer to the first vector (x_1) in the batch.

  • incx[in] [int] specifies the increment for the elements of x.

  • stridex[in] [hipblasStride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stride_x, however the user should take care to ensure that stride_x is of appropriate size, for a typical case this means stride_x >= n * incx.

  • batchCount[in] [int] specifies the number of batches in x.

The scalStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasXswap + Batched, StridedBatched#

hipblasStatus_t hipblasSswap(hipblasHandle_t handle, int n, float *x, int incx, float *y, int incy)#
hipblasStatus_t hipblasDswap(hipblasHandle_t handle, int n, double *x, int incx, double *y, int incy)#
hipblasStatus_t hipblasCswap(hipblasHandle_t handle, int n, hipblasComplex *x, int incx, hipblasComplex *y, int incy)#
hipblasStatus_t hipblasZswap(hipblasHandle_t handle, int n, hipblasDoubleComplex *x, int incx, hipblasDoubleComplex *y, int incy)#

BLAS Level 1 API.

swap interchanges vectors x and y.

y := x; x := y
  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : s,d,c,z

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in x and y.

  • x[inout] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of x.

  • y[inout] device pointer storing vector y.

  • incy[in] [int] specifies the increment for the elements of y.

The swap function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSswapBatched(hipblasHandle_t handle, int n, float *const x[], int incx, float *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasDswapBatched(hipblasHandle_t handle, int n, double *const x[], int incx, double *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasCswapBatched(hipblasHandle_t handle, int n, hipblasComplex *const x[], int incx, hipblasComplex *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasZswapBatched(hipblasHandle_t handle, int n, hipblasDoubleComplex *const x[], int incx, hipblasDoubleComplex *const y[], int incy, int batchCount)#

BLAS Level 1 API.

swapBatched interchanges vectors x_i and y_i, for i = 1 , … , batchCount

y_i := x_i; x_i := y_i
  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in each x_i and y_i.

  • x[inout] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment for the elements of each x_i.

  • y[inout] device array of device pointers storing each vector y_i.

  • incy[in] [int] specifies the increment for the elements of each y_i.

  • batchCount[in] [int] number of instances in the batch.

The swapBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

hipblasStatus_t hipblasSswapStridedBatched(hipblasHandle_t handle, int n, float *x, int incx, hipblasStride stridex, float *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasDswapStridedBatched(hipblasHandle_t handle, int n, double *x, int incx, hipblasStride stridex, double *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasCswapStridedBatched(hipblasHandle_t handle, int n, hipblasComplex *x, int incx, hipblasStride stridex, hipblasComplex *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasZswapStridedBatched(hipblasHandle_t handle, int n, hipblasDoubleComplex *x, int incx, hipblasStride stridex, hipblasDoubleComplex *y, int incy, hipblasStride stridey, int batchCount)#

BLAS Level 1 API.

swapStridedBatched interchanges vectors x_i and y_i, for i = 1 , … , batchCount

y_i := x_i; x_i := y_i
  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • n[in] [int] the number of elements in each x_i and y_i.

  • x[inout] device pointer to the first vector x_1.

  • incx[in] [int] specifies the increment for the elements of x.

  • stridex[in] [hipblasStride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stride_x, however the user should take care to ensure that stride_x is of appropriate size, for a typical case this means stride_x >= n * incx.

  • y[inout] device pointer to the first vector y_1.

  • incy[in] [int] specifies the increment for the elements of y.

  • stridey[in] [hipblasStride] stride from the start of one vector (y_i) and the next one (y_i+1). There are no restrictions placed on stride_x, however the user should take care to ensure that stride_y is of appropriate size, for a typical case this means stride_y >= n * incy. stridey should be non zero.

  • batchCount[in] [int] number of instances in the batch.

The swapStridedBatched function supports the 64-bit integer interface. Refer to section ILP64 Interface.

Level 2 BLAS#

hipblasXgbmv + Batched, StridedBatched#

hipblasStatus_t hipblasSgbmv(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const float *alpha, const float *AP, int lda, const float *x, int incx, const float *beta, float *y, int incy)#
hipblasStatus_t hipblasDgbmv(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const double *alpha, const double *AP, int lda, const double *x, int incx, const double *beta, double *y, int incy)#
hipblasStatus_t hipblasCgbmv(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const hipblasComplex *alpha, const hipblasComplex *AP, int lda, const hipblasComplex *x, int incx, const hipblasComplex *beta, hipblasComplex *y, int incy)#
hipblasStatus_t hipblasZgbmv(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *AP, int lda, const hipblasDoubleComplex *x, int incx, const hipblasDoubleComplex *beta, hipblasDoubleComplex *y, int incy)#

BLAS Level 2 API.

gbmv performs one of the matrix-vector operations

y := alpha*A*x    + beta*y,   or
y := alpha*A**T*x + beta*y,   or
y := alpha*A**H*x + beta*y,
where alpha and beta are scalars, x and y are vectors and A is an m by n banded matrix with kl sub-diagonals and ku super-diagonals.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : s,d,c,z

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • trans[in] [hipblasOperation_t] indicates whether matrix A is tranposed (conjugated) or not

  • m[in] [int] number of rows of matrix A

  • n[in] [int] number of columns of matrix A

  • kl[in] [int] number of sub-diagonals of A

  • ku[in] [int] number of super-diagonals of A

  • alpha[in] device pointer or host pointer to scalar alpha.

  • AP[in] device pointer storing banded matrix A. Leading (kl + ku + 1) by n part of the matrix contains the coefficients of the banded matrix. The leading diagonal resides in row (ku + 1) with the first super-diagonal above on the RHS of row ku. The first sub-diagonal resides below on the LHS of row ku + 2. This propagates up and down across sub/super-diagonals. Ex: (m = n = 7; ku = 2, kl = 2) 1 2 3 0 0 0 0 0 0 3 3 3 3 3 4 1 2 3 0 0 0 0 2 2 2 2 2 2 5 4 1 2 3 0 0 -&#8212;> 1 1 1 1 1 1 1 0 5 4 1 2 3 0 4 4 4 4 4 4 0 0 0 5 4 1 2 0 5 5 5 5 5 0 0 0 0 0 5 4 1 2 0 0 0 0 0 0 0 0 0 0 0 5 4 1 0 0 0 0 0 0 0 Note that the empty elements which don’t correspond to data will not be referenced.

  • lda[in] [int] specifies the leading dimension of A. Must be >= (kl + ku + 1)

  • x[in] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of x.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device pointer storing vector y.

  • incy[in] [int] specifies the increment for the elements of y.

hipblasStatus_t hipblasSgbmvBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const float *alpha, const float *const AP[], int lda, const float *const x[], int incx, const float *beta, float *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasDgbmvBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const double *alpha, const double *const AP[], int lda, const double *const x[], int incx, const double *beta, double *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasCgbmvBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const hipblasComplex *alpha, const hipblasComplex *const AP[], int lda, const hipblasComplex *const x[], int incx, const hipblasComplex *beta, hipblasComplex *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasZgbmvBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *const AP[], int lda, const hipblasDoubleComplex *const x[], int incx, const hipblasDoubleComplex *beta, hipblasDoubleComplex *const y[], int incy, int batchCount)#

BLAS Level 2 API.

gbmvBatched performs one of the matrix-vector operations

y_i := alpha*A_i*x_i    + beta*y_i,   or
y_i := alpha*A_i**T*x_i + beta*y_i,   or
y_i := alpha*A_i**H*x_i + beta*y_i,
where (A_i, x_i, y_i) is the i-th instance of the batch. alpha and beta are scalars, x_i and y_i are vectors and A_i is an m by n banded matrix with kl sub-diagonals and ku super-diagonals, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • trans[in] [hipblasOperation_t] indicates whether matrix A is tranposed (conjugated) or not

  • m[in] [int] number of rows of each matrix A_i

  • n[in] [int] number of columns of each matrix A_i

  • kl[in] [int] number of sub-diagonals of each A_i

  • ku[in] [int] number of super-diagonals of each A_i

  • alpha[in] device pointer or host pointer to scalar alpha.

  • AP[in] device array of device pointers storing each banded matrix A_i. Leading (kl + ku + 1) by n part of the matrix contains the coefficients of the banded matrix. The leading diagonal resides in row (ku + 1) with the first super-diagonal above on the RHS of row ku. The first sub-diagonal resides below on the LHS of row ku + 2. This propagates up and down across sub/super-diagonals. Ex: (m = n = 7; ku = 2, kl = 2) 1 2 3 0 0 0 0 0 0 3 3 3 3 3 4 1 2 3 0 0 0 0 2 2 2 2 2 2 5 4 1 2 3 0 0 -&#8212;> 1 1 1 1 1 1 1 0 5 4 1 2 3 0 4 4 4 4 4 4 0 0 0 5 4 1 2 0 5 5 5 5 5 0 0 0 0 0 5 4 1 2 0 0 0 0 0 0 0 0 0 0 0 5 4 1 0 0 0 0 0 0 0 Note that the empty elements which don’t correspond to data will not be referenced.

  • lda[in] [int] specifies the leading dimension of each A_i. Must be >= (kl + ku + 1)

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment for the elements of each x_i.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device array of device pointers storing each vector y_i.

  • incy[in] [int] specifies the increment for the elements of each y_i.

  • batchCount[in] [int] specifies the number of instances in the batch.

hipblasStatus_t hipblasSgbmvStridedBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const float *alpha, const float *AP, int lda, hipblasStride strideA, const float *x, int incx, hipblasStride stridex, const float *beta, float *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasDgbmvStridedBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const double *alpha, const double *AP, int lda, hipblasStride strideA, const double *x, int incx, hipblasStride stridex, const double *beta, double *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasCgbmvStridedBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const hipblasComplex *alpha, const hipblasComplex *AP, int lda, hipblasStride strideA, const hipblasComplex *x, int incx, hipblasStride stridex, const hipblasComplex *beta, hipblasComplex *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasZgbmvStridedBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, int kl, int ku, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *AP, int lda, hipblasStride strideA, const hipblasDoubleComplex *x, int incx, hipblasStride stridex, const hipblasDoubleComplex *beta, hipblasDoubleComplex *y, int incy, hipblasStride stridey, int batchCount)#

BLAS Level 2 API.

gbmvStridedBatched performs one of the matrix-vector operations

y_i := alpha*A_i*x_i    + beta*y_i,   or
y_i := alpha*A_i**T*x_i + beta*y_i,   or
y_i := alpha*A_i**H*x_i + beta*y_i,
where (A_i, x_i, y_i) is the i-th instance of the batch. alpha and beta are scalars, x_i and y_i are vectors and A_i is an m by n banded matrix with kl sub-diagonals and ku super-diagonals, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • trans[in] [hipblasOperation_t] indicates whether matrix A is tranposed (conjugated) or not

  • m[in] [int] number of rows of matrix A

  • n[in] [int] number of columns of matrix A

  • kl[in] [int] number of sub-diagonals of A

  • ku[in] [int] number of super-diagonals of A

  • alpha[in] device pointer or host pointer to scalar alpha.

  • AP[in] device pointer to first banded matrix (A_1). Leading (kl + ku + 1) by n part of the matrix contains the coefficients of the banded matrix. The leading diagonal resides in row (ku + 1) with the first super-diagonal above on the RHS of row ku. The first sub-diagonal resides below on the LHS of row ku + 2. This propagates up and down across sub/super-diagonals. Ex: (m = n = 7; ku = 2, kl = 2) 1 2 3 0 0 0 0 0 0 3 3 3 3 3 4 1 2 3 0 0 0 0 2 2 2 2 2 2 5 4 1 2 3 0 0 -&#8212;> 1 1 1 1 1 1 1 0 5 4 1 2 3 0 4 4 4 4 4 4 0 0 0 5 4 1 2 0 5 5 5 5 5 0 0 0 0 0 5 4 1 2 0 0 0 0 0 0 0 0 0 0 0 5 4 1 0 0 0 0 0 0 0 Note that the empty elements which don’t correspond to data will not be referenced.

  • lda[in] [int] specifies the leading dimension of A. Must be >= (kl + ku + 1)

  • strideA[in] [hipblasStride] stride from the start of one matrix (A_i) and the next one (A_i+1)

  • x[in] device pointer to first vector (x_1).

  • incx[in] [int] specifies the increment for the elements of x.

  • stridex[in] [hipblasStride] stride from the start of one vector (x_i) and the next one (x_i+1)

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device pointer to first vector (y_1).

  • incy[in] [int] specifies the increment for the elements of y.

  • stridey[in] [hipblasStride] stride from the start of one vector (y_i) and the next one (x_i+1)

  • batchCount[in] [int] specifies the number of instances in the batch.

hipblasXgemv + Batched, StridedBatched#

hipblasStatus_t hipblasSgemv(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, const float *alpha, const float *AP, int lda, const float *x, int incx, const float *beta, float *y, int incy)#
hipblasStatus_t hipblasDgemv(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, const double *alpha, const double *AP, int lda, const double *x, int incx, const double *beta, double *y, int incy)#
hipblasStatus_t hipblasCgemv(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, const hipblasComplex *alpha, const hipblasComplex *AP, int lda, const hipblasComplex *x, int incx, const hipblasComplex *beta, hipblasComplex *y, int incy)#
hipblasStatus_t hipblasZgemv(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *AP, int lda, const hipblasDoubleComplex *x, int incx, const hipblasDoubleComplex *beta, hipblasDoubleComplex *y, int incy)#

BLAS Level 2 API.

gemv performs one of the matrix-vector operations

y := alpha*A*x    + beta*y,   or
y := alpha*A**T*x + beta*y,   or
y := alpha*A**H*x + beta*y,
where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : s,d,c,z

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • trans[in] [hipblasOperation_t] indicates whether matrix A is tranposed (conjugated) or not

  • m[in] [int] number of rows of matrix A

  • n[in] [int] number of columns of matrix A

  • alpha[in] device pointer or host pointer to scalar alpha.

  • AP[in] device pointer storing matrix A.

  • lda[in] [int] specifies the leading dimension of A.

  • x[in] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of x.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device pointer storing vector y.

  • incy[in] [int] specifies the increment for the elements of y.

hipblasStatus_t hipblasSgemvBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, const float *alpha, const float *const AP[], int lda, const float *const x[], int incx, const float *beta, float *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasDgemvBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, const double *alpha, const double *const AP[], int lda, const double *const x[], int incx, const double *beta, double *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasCgemvBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, const hipblasComplex *alpha, const hipblasComplex *const AP[], int lda, const hipblasComplex *const x[], int incx, const hipblasComplex *beta, hipblasComplex *const y[], int incy, int batchCount)#
hipblasStatus_t hipblasZgemvBatched(hipblasHandle_t handle, hipblasOperation_t trans, int m, int n, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *const AP[], int lda, const hipblasDoubleComplex *const x[], int incx, const hipblasDoubleComplex *beta, hipblasDoubleComplex *const y[], int incy, int batchCount)#

BLAS Level 2 API.

gemvBatched performs a batch of matrix-vector operations

y_i := alpha*A_i*x_i    + beta*y_i,   or
y_i := alpha*A_i**T*x_i + beta*y_i,   or
y_i := alpha*A_i**H*x_i + beta*y_i,
where (A_i, x_i, y_i) is the i-th instance of the batch. alpha and beta are scalars, x_i and y_i are vectors and A_i is an m by n matrix, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • trans[in] [hipblasOperation_t] indicates whether matrices A_i are tranposed (conjugated) or not

  • m[in] [int] number of rows of each matrix A_i

  • n[in] [int] number of columns of each matrix A_i

  • alpha[in] device pointer or host pointer to scalar alpha.

  • AP[in] device array of device pointers storing each matrix A_i.

  • lda[in] [int] specifies the leading dimension of each matrix A_i.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [int] specifies the increment for the elements of each vector x_i.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device array of device pointers storing each vector y_i.

  • incy[in] [int] specifies the increment for the elements of each vector y_i.

  • batchCount[in] [int] number of instances in the batch

hipblasStatus_t hipblasSgemvStridedBatched(hipblasHandle_t handle, hipblasOperation_t transA, int m, int n, const float *alpha, const float *AP, int lda, hipblasStride strideA, const float *x, int incx, hipblasStride stridex, const float *beta, float *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasDgemvStridedBatched(hipblasHandle_t handle, hipblasOperation_t transA, int m, int n, const double *alpha, const double *AP, int lda, hipblasStride strideA, const double *x, int incx, hipblasStride stridex, const double *beta, double *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasCgemvStridedBatched(hipblasHandle_t handle, hipblasOperation_t transA, int m, int n, const hipblasComplex *alpha, const hipblasComplex *AP, int lda, hipblasStride strideA, const hipblasComplex *x, int incx, hipblasStride stridex, const hipblasComplex *beta, hipblasComplex *y, int incy, hipblasStride stridey, int batchCount)#
hipblasStatus_t hipblasZgemvStridedBatched(hipblasHandle_t handle, hipblasOperation_t transA, int m, int n, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *AP, int lda, hipblasStride strideA, const hipblasDoubleComplex *x, int incx, hipblasStride stridex, const hipblasDoubleComplex *beta, hipblasDoubleComplex *y, int incy, hipblasStride stridey, int batchCount)#

BLAS Level 2 API.

gemvStridedBatched performs a batch of matrix-vector operations

y_i := alpha*A_i*x_i    + beta*y_i,   or
y_i := alpha*A_i**T*x_i + beta*y_i,   or
y_i := alpha*A_i**H*x_i + beta*y_i,
where (A_i, x_i, y_i) is the i-th instance of the batch. alpha and beta are scalars, x_i and y_i are vectors and A_i is an m by n matrix, for i = 1, …, batchCount.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : No support

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • transA[in] [hipblasOperation_t] indicates whether matrices A_i are tranposed (conjugated) or not

  • m[in] [int] number of rows of matrices A_i

  • n[in] [int] number of columns of matrices A_i

  • alpha[in] device pointer or host pointer to scalar alpha.

  • AP[in] device pointer to the first matrix (A_1) in the batch.

  • lda[in] [int] specifies the leading dimension of matrices A_i.

  • strideA[in] [hipblasStride] stride from the start of one matrix (A_i) and the next one (A_i+1)

  • x[in] device pointer to the first vector (x_1) in the batch.

  • incx[in] [int] specifies the increment for the elements of vectors x_i.

  • stridex[in] [hipblasStride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stridex, however the user should take care to ensure that stridex is of appropriate size. When trans equals HIPBLAS_OP_N this typically means stridex >= n * incx, otherwise stridex >= m * incx.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device pointer to the first vector (y_1) in the batch.

  • incy[in] [int] specifies the increment for the elements of vectors y_i.

  • stridey[in] [hipblasStride] stride from the start of one vector (y_i) and the next one (y_i+1). There are no restrictions placed on stridey, however the user should take care to ensure that stridey is of appropriate size. When trans equals HIPBLAS_OP_N this typically means stridey >= m * incy, otherwise stridey >= n * incy. stridey should be non zero.

  • batchCount[in] [int] number of instances in the batch

hipblasXger + Batched, StridedBatched#

hipblasStatus_t hipblasSger(hipblasHandle_t handle, int m, int n, const float *alpha, const float *x, int incx, const float *y, int incy, float *AP, int lda)#
hipblasStatus_t hipblasDger(hipblasHandle_t handle, int m, int n, const double *alpha, const double *x, int incx, const double *y, int incy, double *AP, int lda)#
hipblasStatus_t hipblasCgeru(hipblasHandle_t handle, int m, int n, const hipblasComplex *alpha, const hipblasComplex *x, int incx, const hipblasComplex *y, int incy, hipblasComplex *AP, int lda)#
hipblasStatus_t hipblasCgerc(hipblasHandle_t handle, int m, int n, const hipblasComplex *alpha, const hipblasComplex *x, int incx, const hipblasComplex *y, int incy, hipblasComplex *AP, int lda)#
hipblasStatus_t hipblasZgeru(hipblasHandle_t handle, int m, int n, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *x, int incx, const hipblasDoubleComplex *y, int incy, hipblasDoubleComplex *AP, int lda)#
hipblasStatus_t hipblasZgerc(hipblasHandle_t handle, int m, int n, const hipblasDoubleComplex *alpha, const hipblasDoubleComplex *x, int incx, const hipblasDoubleComplex *y, int incy, hipblasDoubleComplex *AP, int lda)#

BLAS Level 2 API.

ger,geru,gerc performs the matrix-vector operations

A := A + alpha*x*y**T , OR
A := A + alpha*x*y**H for gerc
where alpha is a scalar, x and y are vectors, and A is an m by n matrix.

  • Supported precisions in rocBLAS : s,d,c,z

  • Supported precisions in cuBLAS : s,d,c,z

Parameters:
  • handle[in] [hipblasHandle_t] handle to the hipblas library context queue.

  • m[in] [int] the number of rows of the matrix A.

  • n[in] [int] the number of columns of the matrix A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer storing vector x.

  • incx[in] [int] specifies the increment for the elements of x.

  • y[in] device pointer storing vector y.

  • incy[in] [int] specifies the increment for the elements of y.

  • AP[inout] device pointer storing matrix A.

  • lda[in] [int] specifies the leading dimension of A.

hipblasStatus_t hipblasSgerBatched(hipblasHandle_t handle, int m, int n, const float *alpha, const float *const x[], int incx, const float *const y[], int incy, float *const AP[], int lda, int batchCount)#
hipblasStatus_t hipblasDgerBatched(hipblasHandle_t handle, int m, int n, const double *alpha, const double *const x[], int incx, const double *const y[], int incy, double *const AP[], int lda, int batchCount)#
hipblasStatus_t hipblasCgeruBatched(hipblasHandle_t handle, int m, int n, const hipblasComplex *alpha, const hipblasComplex *const x[], int incx, const hipblasComplex *const y[], int incy, hipblasComplex *const AP[], int