ROCm compiler reference#
ROCm includes compilers optimized for high-performance computing on AMD GPUs and CPUs supporting various heterogeneous programming models such as Heterogeneous-computing Interface for Portability (HIP), OpenMP, and OpenCL.
Important
The ROCm compilers only build the x86 and AMDGPU targets. Functionality and options that are relevant to other targets are not included within the ROCm compilers.
For more details, see:
Releases and source: ROCm/llvm-project
ROCm compiler interfaces#
ROCm provides two compiler interfaces for compiling HIP programs:
/opt/rocm/bin/amdclang++
/opt/rocm/bin/hipcc
The ROCm compilers leverage the same LLVM compiler technology with the AMD GCN GPU support;
however, they offer a slightly different user experience. The hipcc
command-line
interface provides a more familiar user interface to users who are
experienced in CUDA but relatively new to the ROCm/HIP development environment.
On the other hand, amdclang++
provides a user interface identical to the clang++
compiler. It is more suitable for experienced developers who want to directly
interact with the clang compiler and gain full control of the application
build process.
The major differences between hipcc
and amdclang++
are listed below:
Feature |
|
|
---|---|---|
Compiling HIP source files |
Treats all source files as HIP language source files |
Enables the HIP language support for files with the |
Detecting GPU architecture |
Auto-detects the GPUs available on the system and generates code for those devices when no GPU architecture is specified |
Has AMD GCN gfx803 as the default GPU architecture. The |
Finding a HIP installation |
Finds the HIP installation based on its own location and its knowledge about the ROCm directory structure |
First looks for HIP under the same parent directory as its own LLVM directory and then falls back on |
Linking to the HIP runtime library |
Is configured to automatically link to the HIP runtime from the detected HIP installation |
Requires the |
Device function inlining |
Inlines all GPU device functions, which provide greater performance and compatibility for codes that contain file scoped or device function scoped |
Relies on inlining heuristics to control inlining. Users experiencing performance or compilation issues with code using file scoped or device function scoped |
Source code location: |
Compiler options and features#
This section discusses compiler options and features.
AMD GPU compilation#
This section outlines commonly used compiler flags for hipcc
and amdclang++
.
Options |
Description |
|
Compiles the source file as a HIP program. |
|
Enables the OpenMP support. |
|
Enables the OpenMP target offload support of the specified GPU architecture, where |
|
Sets the default limit of threads per block, also referred to as the launch bounds, where |
|
Enables unsafe floating point atomic instructions (AMDGPU only). |
|
Allows aggressive, lossy floating-point optimizations. |
|
Sets wavefront size to be 64 or 32 on RDNA architectures. |
|
Switches between CU and WGP modes on RDNA architectures. |
|
HIP offloading target ID. |
|
Generates source-level debug information. |
|
Generates relocatable device code, also known as separate compilation mode. |
Inline ASM statements#
Inline assembly (ASM) statements allow a developer to include assembly instructions directly in either host or device code. The ROCm compiler supports ASM statements, however you should not use them for the following reasons:
The compiler’s ability to produce both correct code and optimize surrounding code is impeded.
The compiler does not parse the content of the ASM statements and cannot examine its contents.
The compiler must make conservative assumptions in an effort to retain correctness.
The conservative assumptions may yield code that is less performant compared to code without ASM statements. It is possible that a syntactically correct ASM statement may cause incorrect runtime behavior.
ASM statements are often ASIC-specific; code containing them is less portable and adds a maintenance burden for the developer if different ASICs are targeted.
Writing correct ASM statements is often difficult; thorough testing of any ASM statements is strongly encouraged.
Note
For developers who include ASM statements in the code, AMD is interested in understanding the use case and appreciates your feedback at ROCm/ROCm#issues
Miscellaneous OpenMP compiler features#
This section discusses features that have been added or enhanced in the OpenMP compiler.
Offload-arch tool#
An LLVM library and tool that is used to query the execution capability of the current system as well as to query requirements of a binary file. It is used by OpenMP device runtime to ensure compatibility of an image with the current system while loading it. It is compatible with target ID support and multi-image fat binary support.
Usage:
offload-arch [Options] [Optional lookup-value]
When used without an option, offload-arch
prints the value of the first offload
architecture found in the underlying system. This can be used by various clang
front ends. For example, to compile for OpenMP offloading on your current system,
invoke clang with the following command:
clang -fopenmp -fopenmp-targets=``offload-arch`` foo.c
If an optional lookup-value is specified, offload-arch
will check if the value
is either a valid offload-arch
or a codename and look up requested additional
information.
The following command provides all the information for offload-arch gfx906:
offload-arch gfx906 - V
The options are listed below:
Options |
Description |
|
Prints the help message. |
|
Prints values for all devices. Do not stop at the first device found. |
|
Prints device code name (often found in |
|
Prints numeric |
|
Prints clang offload triple to use for the offload arch. |
|
Verbose. Implies the following options |
|
Prints offload requirements including offload-arch for each compiled offload image built into an application binary file. |
|
Prints offload capabilities of the underlying system. This option is used by the language runtime to select an image when multiple images are available. A capability must exist for each requirement of the selected image. |
There are symbolic link aliases amdgpu-offload-arch
and nvidia-arch
for
offload-arch
. These aliases return 1 if no AMD GCN GPU or CUDA GPU is found.
These aliases are useful in determining whether architecture-specific tests
should be run or to conditionally load architecture-specific software.
Command-line simplification using offload-arch flag#
Legacy mechanism of specifying offloading target for OpenMP involves using three
flags, -fopenmp-targets
, -Xopenmp-target
, and -march
. The first two flags
take a target triple (like amdgcn-amd-amdhsa
or nvptx64-nvidia-cuda
), while
the last flag takes device name (like gfx908
or sm_70
) as input.
Alternatively, users of the ROCm compiler can simply use the flag -offload-arch
for a
combined effect of the preceding three flags.
Example:
# Legacy mechanism
clang -fopenmp -target x86_64-linux-gnu \
-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa \
-march=gfx906 helloworld.c -o helloworld
Example:
# Using offload-arch flag
clang -fopenmp -target x86_64-linux-gnu \
--offload-arch=gfx906 helloworld.c -o helloworld.
To ensure backward compatibility, both styles are supported. This option is compatible with target ID support and multi-image fat binaries.
Target ID support for OpenMP#
The ROCm compiler supports specification of target features along with the GPU
name while specifying a target offload device in the command line, using
-march
or --offload-arch
options. The compiled image in such cases is
specialized for a given configuration of device and target features (target ID).
Example:
# compiling for a gfx908 device with XNACK paging support turned ON
clang -fopenmp -target x86_64-linux-gnu \
-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa \
-march=gfx908:xnack+ helloworld.c -o helloworld
Example:
# compiling for a gfx908 device with SRAMECC support turned OFF
clang -fopenmp -target x86_64-linux-gnu \
-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa \
-march=gfx908:sramecc- helloworld.c -o helloworld
Example:
# compiling for a gfx908 device with SRAMECC support turned ON and XNACK paging support turned OFF
clang -fopenmp -target x86_64-linux-gnu \
-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa \
-march=gfx908:sramecc+:xnack- helloworld.c -o helloworld
The target ID specified on the command line is passed to the clang driver using
target-feature
flag, to the LLVM optimizer and back end using -mattr
flag, and
to linker using -plugin-opt=-mattr
flag. This feature is compatible with
offload-arch
command-line option and multi-image binaries for multiple
architectures.
Multi-image fat binary for OpenMP#
The ROCm compiler is enhanced to generate binaries that can contain heterogenous images. This heterogeneity could be in terms of:
Images of different architectures, like AMD GCN and NVPTX
Images of same architectures but for different GPUs, like gfx906 and gfx908
Images of same architecture and same GPU but for different target features, like
gfx908:xnack+
andgfx908:xnack-
An appropriate image is selected by the OpenMP device runtime for execution
depending on the capability of the current system. This feature is compatible
with target ID support and offload-arch
command-line options and uses
offload-arch tool to determine capability of the current system.
Example:
clang -fopenmp -target x86_64-linux-gnu \
-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa \
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 \
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908 \
helloworld.c -o helloworld
Example:
clang -fopenmp -target x86_64-linux-gnu \
--offload-arch=gfx906 \
--offload-arch=gfx908 \
helloworld.c -o helloworld
Example:
clang -fopenmp -target x86_64-linux-gnu \
-fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa,amdgcn-amd-amdhsa,amdgcn-amd-amdhsa \
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908:sramecc+:xnack+ \
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908:sramecc-:xnack+ \
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908:sramecc+:xnack- \
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908:sramecc-:xnack- \
helloworld.c -o helloworld
ROCmCC compilers create an instance of toolchain for each unique combination
of target triple and the target GPU (along with the associated target features).
clang-offload-wrapper
tool is modified to insert a new structure
__tgt_image_info
along with each image in the binary. Device runtime is also
modified to query this structure to identify a compatible image based on the
capability of the current system.