Contents

--fill0 [std::vector<std::string>]#

Fill parameter with 0s

--fill1 [std::vector<std::string>]#

Fill parameter with 1s

--gpu#

Compile on the gpu

--cpu#

Compile on the cpu

--ref#

Compile on the reference implementation

--gpu-arch [std::string]#

Cross-compile for the given GPU architecture (e.g. gfx942) without requiring a physical device. Only applies to the gpu target.

--gpu-num-cus [std::size_t] (Default: 120)#

Number of compute units to assume for cross-compilation. Only used when --gpu-arch is set.

--gpu-num-chiplets [std::size_t] (Default: 1)#

Number of chiplets (XCCs) to assume for cross-compilation. Only used when --gpu-arch is set.

--enable-offload-copy#

Enable implicit offload copying

--disable-fast-math#

Disable fast math optimization

--exhaustive-tune#

Perform an exhaustive search to find the fastest version of generated kernels for selected backend

--fp16#

Quantize for fp16

--bf16#

Quantize for bf16

--int8#

Quantize for int8

--fp8#

Quantize for Float8E4M3FNUZ type