VCN and JPEG activity sampling and tracing

VCN and JPEG activity sampling and tracing#

ROCm Systems Profiler supports sampling of VCN and JPEG engines activities. It allows you to gather key performance metrics for VCN utilization and understand engine usage through visualization. This information can be used to optimize media and video workloads. Additionally, it supports tracing of rocDecode APIs, rocJPEG APIs, and the Video Acceleration APIs (VA-APIs). Tracing these APIs provides insights into how different components of the video encoding and decoding workloads interact with the VCN engine.

Sampling support#

Sampling of VCN and JPEG engine activity is supported by leveraging AMD SMI which provides the interface for GPU metric collection.

  1. Set the ROCPROFSYS_USE_AMD_SMI environment variable to enable GPU metric collection:

export ROCPROFSYS_USE_AMD_SMI=true
  1. Update the ROCPROFSYS_AMD_SMI_METRICS variable to collect the VCN and JPEG activity metrics. The default value is:

ROCPROFSYS_AMD_SMI_METRICS=busy,temp,power,mem_usage

To include VCN and JPEG activity metrics, update it to:

ROCPROFSYS_AMD_SMI_METRICS=busy,temp,power,mem_usage,vcn_activity,jpeg_activity

Alternatively, you can use the following to collect all available GPU metrics:

ROCPROFSYS_AMD_SMI_METRICS=all

API tracing support#

Tracing of rocDecode and rocJPEG APIs is supported by leveraging ROCprofiler-SDK which provides runtime-independent APIs for tracing the runtime calls and asynchronous activities associated with decoder activities and workload in VCN and JPEG engines.

To enable tracing for the rocDecode and rocJPEG APIs, update the ROCPROFSYS_ROCM_DOMAINS variable. The default value is:

ROCPROFSYS_ROCM_DOMAINS=hip_runtime_api,marker_api,kernel_dispatch,memory_copy,scratch_memory,page_migration

Add rocdecode_api and rocjpeg_api to include tracing for rocDecode and rocJPEG APIs:

ROCPROFSYS_ROCM_DOMAINS=hip_runtime_api,marker_api,kernel_dispatch,memory_copy,scratch_memory,page_migration,rocdecode_api,rocjpeg_api

Note

By default, enabling rocdecode_api or rocjpeg_api also enables VA-API tracing.

To explore all supported tracing domains, use the command:

rocprof-sys-avail -bd -r ROCM_DOMAINS

For more details on the APIs, refer to ROCprofiler-SDK Developer Docs.

Using rocDecode and rocJPEG samples#

For testing purposes, you can use the rocDecode samples and rocJPEG samples. For generating sufficient load for VCN and JPEG engines, you can use the following samples:

For video decoding:
For JPEG decoding:

After completing the build steps mentioned in the sample documentation, proceed with the following steps:

  1. Source the ROCm Systems Profiler Environment using:

source /opt/rocprofiler-systems/share/rocprofiler-systems/setup-env.sh

Alternatively, if you are using modules, use:

module use /opt/rocprofiler-systems/share/modulefiles
  1. Generate and configure the profiler config file.

rocprof-sys-avail -G $HOME/.rocprofsys.cfg -F txt
export ROCPROFSYS_CONFIG_FILE=$HOME/.rocprofsys.cfg

Edit .rocprofsys.cfg with the following settings:

ROCPROFSYS_USE_AMD_SMI     = true
ROCPROFSYS_AMD_SMI_METRICS = busy,temp,power,mem_usage,vcn_activity,jpeg_activity
ROCPROFSYS_ROCM_DOMAINS    = hip_runtime_api,marker_api,kernel_dispatch,memory_copy,scratch_memory,page_migration,rocdecode_api,rocjpeg_api
  1. Profile the rocDecode sample.

rocprof-sys-sample -PTHD -- ./videodecodebatch -i /opt/rocm/share/rocdecode/video/

Note

If the rocdecode-dev package is installed, then the sample videos will be located in /opt/rocm/share/rocdecode/video, by default.

At the end of the run, a similar message appears:

[rocprofiler-systems][964294][perfetto]> Outputting '/home/demo/rocprofsys-videodecodebatch-output/2025-04-25_15.52/perfetto-trace-964294.proto'
(2792.91 KB / 2.79 MB / 0.00 GB)... Done

To view the generated .proto file in the browser, open the Perfetto UI page. Then, click on Open trace file and select the .proto file. In the browser, a similar visualization is generated.

Visualization of a performance graph in Perfetto with VCN Activity tracks Visualization of a performance graph in Perfetto with rocdecode and VA-API traces
  1. To profile the rocJPEG sample, use:

rocprof-sys-sample -v 2 -PTHD -- ./jpegdecodeperf -i /opt/rocm/share/rocjpeg/image/

Note

If rocjpeg-dev package is installed, the sample images will be located in the /opt/rocm/share/rocjpeg/image/ directory. Duplicate the images to generate enough workload to see activity in the trace

Visualization of a performance graph in Perfetto with JPEG Activity tracks