ROCm Systems Profiler Glossary

ROCm Systems Profiler Glossary#

This topic explains the terminology necessary to use ROCm Systems Profiler. The list below provides a basic glossary for those who are new to binary instrumentation. It also clarifies ambiguities when certain terms have different contextual meanings, for example, the ROCm Systems Profiler meaning of the term “module” when instrumenting Python.

Binary

A file written in the Executable and Linkable Format (ELF). This is the standard file format for executable files, shared libraries, etc.

Binary instrumentation

Inserting callbacks to instrumentation into an existing binary. This can be performed statically or dynamically.

Static binary instrumentation

Loads an existing binary, determines instrumentation points, and generates a new binary with instrumentation directly embedded. It is applicable to executables and libraries but limited to only the functions defined in the binary. This is also known as Binary rewrite.

Dynamic binary instrumentation

Loads an existing binary into memory, inserts instrumentation, and runs the binary. It is limited to executables but is capable of instrumenting linked libraries. This is also known as Runtime instrumentation.

Statistical sampling

At periodic intervals, the application is paused and the current call-stack of the CPU is recorded along with various other metrics. It uses timers that measure either (A) real clock time or (B) the CPU time used by the current thread and the CPU time expended on behalf of the thread by the system. This is also known as simply sampling.

Sampling rate
  • The period at which (A) or (B) are triggered (in units of # interrupts / second)

  • Higher values increase the number of samples

Sampling delay
  • How long to wait before (A) and (B) begin triggering at their designated rate

Sampling duration
  • The amount of time (in real-time) after the start of the application to record samples.

  • After this time limit has been reached, no more samples are recorded.

Process sampling

At periodic (real-time) intervals, a background thread records global metrics without interrupting the current process. These metrics include, but are not limited to: CPU frequency, CPU memory high-water mark (i.e. peak memory usage), GPU temperature, and GPU power usage.

Sampling rate
  • The real-time period for recording metrics (in units of # measurements / second)

  • Higher values increase the number of samples

Sampling delay
  • How long to wait (in real-time) before recording samples

Sampling duration
  • The amount of time (in real-time) after the start of the application to record samples.

  • After this time limit has been reached, no more samples are recorded.

Module

With respect to binary instrumentation, a module is defined as either the filename (such as foo.c) or library name (libfoo.so) which contains the definition of one or more functions.

With respect to Python instrumentation, a module is defined as the file which contains the definition of one or more functions. The full path to this file typically contains the name of the “Python module”.

Basic block

A straight-line code sequence with no branches in (except for the entry) and no branches out (except for the exit).

Address range

The instructions for a function in a binary start at certain address with the ELF file and end at a certain address. The range is end - start.

The address range is a decent approximation for the “cost” of a function. For example, a larger address range approximately equates to more instructions.

Instrumentation traps

On the x86 architecture, because instructions are of variable size, an instruction might be too small for Dyninst to replace it with the normal code sequence used to call instrumentation. When instrumentation is placed at points other than subroutine entry, exit, or call points, traps may be used to ensure the instrumentation fits. (By default, rocprof-sys-instrument avoids instrumentation which requires a trap.)

Overlapping functions

Due to language constructs or compiler optimizations, it might be possible for multiple functions to overlap (that is, share part of the same function body) or for a single function to have multiple entry points. In practice, it’s impossible to determine the difference between multiple overlapping functions and a single function with multiple entry points. (By default, rocprof-sys-instrument avoids instrumenting overlapping functions.)