ROCgdb commands for key operations#
This topic summarizes the ROCgdb commands for key operations.
Inspecting kernel state#
Here are the commands used to inspect the kernel state:
View kernel code#
(gdb) list
Sample output:
1 #include <hip/hip_runtime.h>
2 #include <algorithm>
3 #include <iostream>
4 #include <numeric>
5 #include <vector>
6 #include <cstddef>
8 __global__ void saxpy_kernel(const float a, const float* d_x, float* d_y, const unsigned int size)
9 {
10 // Compute the current thread's index in the grid.
11 const unsigned int global_idx = blockIdx.x * blockDim.x + threadIdx.x;
12 // The grid can be larger than the number of items in the vectors. Avoid out-of-bounds addressing.
13 if(global_idx < size)
14 {
15 d_y[global_idx] = a * d_x[global_idx] + d_y[global_idx];
16 }
17 }
18 int main()
19 {
....
99 }
View disassembly#
(gdb) disassemble
Sample output:
Dump of assembler code for function _ZL3bari:
0x00007ffff608e2b0 <+0>: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
0x00007ffff608e2b4 <+4>: s_mov_b32 s25, s33
0x00007ffff608e2b8 <+8>: s_mov_b32 s33, s32
0x00007ffff608e2bc <+12>: s_xor_saveexec_b64 s[16:17], -1
0x00007ffff608e2c0 <+16>: buffer_store_dword v36, off, s[0:3], s33 offset:52
.....
0x00007ffff608e92c <+1660>: s_mov_b64 exec, s[4:5]
0x00007ffff608e930 <+1664>: s_mov_b32 s33, s25
0x00007ffff608e934 <+1668>: s_waitcnt vmcnt(0)
0x00007ffff608e938 <+1672>: s_setpc_b64 s[30:31]
End of assembler dump.
View system information#
The following commands are related to heterogeneous debugging:
Agents:
The following command lists the information shown in the sample output for each heterogeneous agent:
(gdb) info agents
Sample output:
Id State Target Id Architecture Device Name Cores Threads Location * 1 A AMDGPU Agent (GPUID 35090) gfx90a AMD Instinct MI210 416 3328 0000:4a:00.0 2 A AMDGPU Agent (GPUID 34915) gfx90a AMD Instinct MI210 416 3328 0000:09:00.0 3 A AMDGPU Agent (GPUID 56224) gfx90a AMD Instinct MI210 416 3328 0000:0c:00.0 4 A AMDGPU Agent (GPUID 33385) gfx90a AMD Instinct MI210 416 3328 0000:11:00.0
For more information, see info agents command.
Queues:
The following command lists the information shown in the sample output for each heterogeneous queue:
(gdb) info queues
Sample output:
Id Target Id Type Read Write Size Address 1 AMDGPU Queue 1:1 (QID 0) HSA 2 2 4096 0x00007ffff626e000 * 2 AMDGPU Queue 1:2 (QID 1) HSA 0 2 1048576 0x00007fffe5800000
For more information, see info queues command.
Dispatches:
The following command lists the information shown in the sample output for each heterogeneous dispatch:
(gdb) info dispatches
Sample output:
Id Target Id Grid Workgroup Fence Kernel Function * 1 AMDGPU Dispatch 1:2:1 (PKID 0) [1,1,1] [1,1,1] B|Aa|Ra kern()
For more information, see info dispatches command.
Threads:
In some operating systems where a single program might have more than one thread of execution, the threads are akin to multiple processes with a shared address space but individual registers, execution stack, and perhaps private memory.
To facilitate debugging such multi-thread programs, the following command lists the threads created on all heterogeneous agents:
(gdb) info threads
Sample output:
Id Target Id Frame 1 Thread 0x7ffff6288180 (LWP 645917) "nosimple" 0x00007ffff207d586 in ?? () from /opt/rocm-7.1.0/lib/libhsa-runtime64.so.1 2 Thread 0x7fffe81ff6c0 (LWP 645924) "nosimple" __GI___ioctl (fd=3, request=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36 4 Thread 0x7fffe61ff6c0 (LWP 645926) "nosimple" __GI___ioctl (fd=3, request=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36 6 Thread 0x7ffff5fff6c0 (LWP 645930) "nosimple" __GI___ioctl (fd=3, request=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36 * 7 AMDGPU Wave 1:2:1:1 (0,0,0)/0 "saxpy" kern () at /home/user/saxpy.cpp:7
For more information, see Debugging programs with multiple threads.
Lanes:
On some heterogeneous systems there can be heterogeneous agents that support Single Instruction Multiple Data (SIMD) or Single Instruction Multiple Threads (STMT) machine instructions. On these target architectures, a single machine instruction can operate in parallel on multiple heterogeneous lanes.
To facilitate debugging heterogeneous programs, the following command displays information about individual source language threads of execution that are mapped to SIMD-like lanes of a thread.
(gdb) info lanes
Sample output:
Id State Target Id Frame * 0 A AMDGPU Lane 1:2:1:1/0 (0,0,0)[0,0,0] kern () at /home/user/saxpy.cpp:7
For more information, see Debugging heterogeneous programs.
View back trace#
(gdb) backtrace
Sample output:
#0 saxpy (tid=0) at /home/oogunbow/saxpy.cpp:33
#1 0x00007ffff608ee40 in kern () at /home/user/saxpy.cpp:4
View stack frames#
(gdb) info frame
Sample output:
Stack level 0, frame at private_wave#0x800:
pc = 0x7ffff608e3bc in bar (/home/user/saxpy.cpp:33); saved pc = 0x7ffff608ee40
called by frame at private_wave#0x0
source language c++.
Arglist at private_wave#0x800, args: tid=0
Locals at private_wave#0x800, Saved registers:
v36 at private_wave#0x1500, v37 at private_wave#0x1600
View frame arguments#
(gdb) info args
Sample output:
tid = 0
View frame local variables#
(gdb) info locals
Sample output:
No locals.
View GPU registers#
(gdb) info registers
This command dumps the content of the current wavefront’s registers.
Sample output:
v0 {0x30 <repeats 64 times>}
....
s41 0x0 0
m0 0x1008 4104
pc 0x7ffff608e3bc 0x7ffff608e3bc <saxpy(int)+268>
exec 0x5555555555555555 6148914691236517205
vcc 0xffffffffffffffff 18446744073709551615
This command dumps only the general-purpose registers, which provide all-inclusive data about the state of the current wavefront.
To get data for all registers, use:
(gdb) info all-registers
View GPU data @ address spaces#
(gdb) x/nfu global#0xdeadbeef
(gdb) x/nfu local#0xdeadbeef
(gdb) x/nfu generic#0xdeadbeef
(gdb) x/nfu private_wave#0xdeadbeef
(gdb) x/nfu private_lane#0xdeadbeef
For more information, see AMD GPU address spaces.
View CPU/GPU threads#
(gdb) info threads
Sample output:
Id Target Id Frame
1 Thread 0x7ffff648bf80 (LWP 1981864) "saxpy" 0x00007ffff5a6c9ef in ?? () from /opt/rocm-7.1.0/lib/libhsa-runtime64.so.1
2 Thread 0x7ffff55ff6c0 (LWP 1981871) "saxpy" __GI___ioctl (fd=3, request=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36
4 Thread 0x7fffeffff6c0 (LWP 1981873) "saxpy" __GI___ioctl (fd=3, request=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36
6 Thread 0x7ffff5dff6c0 (LWP 1981877) "saxpy" __GI___ioctl (fd=3, request=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36
* 7 AMDGPU Wave 1:2:1:1 (0,0,0)/0 "saxpy" saxpy_kernel () at saxpy.cpp:8
Switch threads#
(gdb) thread <id>
Printing kernel data#
Commands to print the kernel data:
Print variable#
(gdb) print foo
Print array#
(gdb) print *foo[2]@8 -- i.e. (gdb) print *<address>@<count>
Print expressions#
(gdb) print foo[4] >> 1
Print formats#
// (gdb) p/<code> <value>
(gdb) p/x foo[1]
The values for <code> are:
x - hexadecimal
d - decimal
u - unsigned decimal
o - octal
t - binary (two)
a - address (hex + offset)
c - character
f - float
s - string
z - hexadecimal with leading zeros
r - raw (skips pretty printing)
Modifying kernel data#
The commands to modify the kernel data:
Using set command#
Use the set command to modify kernel data directly.
(gdb) set var foo[1]=45
Using print command#
The print command is an indirect way to modify the kernel data.
(gdb) print foo[3]=3
Changing kernel focus#
Commands to change the kernel thread, lane, or frame:
Change thread#
(gdb) thread 9
Change lane#
(gdb) lane 5
Change frame#
(gdb) frame <index>
(gdb) up <count>
(gdb) down <count>
Controlling kernel execution#
Commands to control kernel execution:
Set breakpoints#
(gdb) break saxpy.cpp:47
(gdb) break func_foo
(gdb) break *0x01234567
Set temporary breakpoints#
(gdb) tbreak saxpy.cpp:47
(gdb) tbreak func_foo
(gdb) tbreak +24
Set conditional breakpoints#
(gdb) break func_foo if idx == 9
(gdb) break func_foo if $_agent == 2
(gdb) break func_foo if $_queue == 1
(gdb) break func_foo if $_dispatch == 6
(gdb) break func_foo if $_thread == 7
(gdb) break func_foo if $_lane == 15
(gdb) break func_foo if $_thread_workgroup_pos == 3
(gdb) break func_foo if $_lane_workgroup_pos == "[0,0,0]"
Set watchpoints#
(gdb) watch foo[4]
Set catchpoints#
(gdb) catch load -- Catch loads of shared libraries (debug dynamic linking).
(gdb) catch unload -- Catch unloads of shared libraries (track cleanup/unloading).
(gdb) catch rethrow -- Catch an exception, when rethrown (trace exception propagation).
(gdb) catch signal SIGSEGV -- Catch signals by their names and/or numbers (debug crashes or signals).
(gdb) catch syscall open -- Catch system calls by names, groups, or numbers (trace system-level calls).
(gdb) catch throw -- Catch an exception, when thrown (trace exception origins).
(gdb) catch vfork -- Catch calls to vfork (monitor child process creation).
Set scheduler locking (waves)#
(gdb) set scheduler-locking on
For more information, see Scheduler locking mode.
Set scheduler non-stop (waves)#
set non-stop non
For more information, see Non-stop mode.
Set scheduler all-stop (waves)#
set non-stop off
For more information, see All-stop mode.
Disable breakpoint, watchpoint, catchpoint#
disable 4
Enable breakpoint, watchpoint, catchpoint#
enable 4
Delete breakpoint, watchpoint, catchpoint#
// delete <list>
delete 4
Step execution (source line)#
(gdb) step
(gdb) next
Step execution (multiple source lines)#
(gdb) step 3
(gdb) next 3
Step execution (stack frame)#
(gdb) until
(gdb) until 0x0000ffffdeadbeef
(gdb) finish
Step execution (machine instruction)#
(gdb) stepi
(gdb) nexti
Resume execution#
(gdb) continue
Command sequence:
(gdb) break saxpy.cpp:47
command BREAKPOINT_NUMBER
continue
end