Using rocsys

Using rocsys#

rocsys is a command-line utility tool used to invoke and control a profiling session (launch/start/stop/exit) on an application being traced or profiled. rocsys is especially useful for selective profiling of applications with long-running workloads (such as DNN training) as it allows you to profile and control the application while it is running. You can also launch the session from one terminal and control the application using rocsys from another terminal.

To see all the rocsys options, run:

rocsys -help

rocsys: launch must be preceded by --session <name>
e.g. rocsys --session <SESSION_NAME> launch <MPI_COMMAND> <MPI_ARGUMENTS> rocprofv2
<ROCPROFV2_OPTIONS> <APP_EXEC>

where all mpiexec options must come before rocsys
rocsys: start must be preceded by --session <name>
   rocsys --session <name> start
rocsys: stop must be preceded by --session <name>
   rocsys --session <name> stop
rocsys: exit must be preceded by --session <name>
   rocsys --session <name> exit

The following are the session management options used with rocsys in the given order to achieve selective profiling on the rocprofv2 run:

  1. Launch - Creates a session. After launching the application stops until the session is started as shown in step 2.

/opt/rocm/bin/rocsys --session session1 launch rocprofv2 -i ../samples/input.txt <long_running_app>
ROCSYS:: Session ID: 2109
ROCSYS Session Created!
ROCProfilerV2: Collecting the following counters:
- SQ_WAVES
- GRBM_COUNT
- GRBM_GUI_ACTIVE
- SQ_INSTS_VALU
- FETCH_SIZE
Enabling Counter Collection
  1. Start - Starts the halted after launching session on the same or another terminal, and begins dumping kernel profiling information. The start command triggers the halted application to run.

/opt/rocm/bin/rocsys --session session1 start
ROCSYS:: Starting Tools Session...
Dispatch_ID(1), GPU_ID(1), ... // All the metrics of a kernel
Dispatch_ID(2), GPU_ID(1), ... // All the metrics of a kernel
Dispatch_ID(3), GPU_ID(1), ... // All the metrics of a kernel
  1. Stop - Stops the session. The information displayed on the terminal is a result of kernel profiling between the current and the previous rocsys command. Note that this command stops only the profiling session without affecting the application on the run.

/opt/rocm/bin/rocsys --session session1 stop
ROCSYS:: Stopping Tools Session...
Dispatch_ID(22397), GPU_ID(1), ... // All the metrics of a kernel
Dispatch_ID(22398), GPU_ID(1), ... // All the metrics of a kernel
Dispatch_ID(22399), GPU_ID(1), ... // All the metrics of a kernel
  1. Start (to restart) - rocsys allows you to start and stop the session innumerable times once the session is created. This helps in analyzing batches of kernel profiling information.

/opt/rocm/bin/rocsys --session session1 start
ROCSYS:: Starting Tools Session...
Dispatch_ID(22400), GPU_ID(1), ... // All the metrics of a kernel
Dispatch_ID(22401), GPU_ID(1), ... // All the metrics of a kernel
  1. Exit - Exits the profiling session. Once the session is exited, it cannot be restarted.

/opt/rocm/bin/rocsys --session session1 exit
Dispatch_ID(16828), GPU_ID(1), ... // All the metrics of a kernel
Dispatch_ID(16829), GPU_ID(1), ... // All the metrics of a kernel
ROCSYS:: Exiting Tools Session...Application might still be finishing up..

Note

Exiting the session only stops profiling. The application could continue running to completion in the background. If you don’t want to wait for the application to finish, use CTRL+C to stop the application after exit.