<no title>

Device properties to assume for cross-compilation, as a JSON object (e.g. "{arch:gfx942, num_cu:120, num_chiplets:1, max_threads_per_cu:2048, max_threads_per_block:1024}"). Overrides --gpu-arch, --gpu-num-cus and --gpu-num-chiplets for any keys present. Specifying arch here is sufficient to enable cross-compilation without --gpu-arch.

--enable-offload-copy#

Enable implicit offload copying

--disable-fast-math#

Disable fast math optimization

--exhaustive-tune#

Perform an exhaustive search to find the fastest version of generated kernels for selected backend

--fp16#

Quantize for fp16

--bf16#

Quantize for bf16

--int8#

Quantize for int8

--fp8#

Quantize for Float8E4M3FNUZ type

Contents