Implementing process attachment tools#

Overview#

This topic provides the technical details needed to implement a process attachment tool similar to rocprofv3 --attach. Process attachment allows profiling tools to dynamically attach to running GPU applications without requiring application restart. The implementation can use either the provided Python or exported C functions.

Direct Python execution#

The Python file rocprof-attach can be directly called to attach to a specific Process ID (PID) and use custom tools within the attachment target.

$ rocprof-attach -p 12345 -t path/to/your-tool-library.so -d 5000

In the preceding example, the rocprof-attach will attach to the process with PID 12345 and the library path/to/your-tool-library.so will be loaded by ROCprofiler-SDK from within that process. detach will be called after 5000 milliseconds and rocprof-attach will exit when detachment is complete.

By default, rocprof-attach attaches to the target process and all of its descendant processes. To attach only to the specified PID, pass --attach-children=false:

$ rocprof-attach -p 12345 -t path/to/your-tool-library.so --attach-children=false

More information can be found by invoking rocprof-attach -h

Python functions#

The python file rocprof-attach defines an attach function that can be used for attachment:

def attach(
  pid,
  attach_tool_library,
  attach_duration_msec,
  attach_library=ROCPROF_ATTACH_LIBRARY,
  attach_children=True,
):

Function details

The attach function performs the entire attachment process, including attachment and detachment, and provides the ability to use custom tools via the tool_libraries parameter. It also has a simple control flow intended for direct calling from Python. For more complex control, it’s recommended to use the explicit attach and detach functions provided by the librocprofiler-sdk-rocattach.so binary instead.

Parameters

  • pid: Required - PID of process to attach to.

  • attach_tool_library: Colon delimited list of tool libraries to use.

  • attach_duration_msec: Optional - Profiling duration in milliseconds.
    • If unspecified, attachment runs until Enter is pressed or SIGINT (Ctrl+C) is received.

  • attach_library: Optional - Tool library to use for attachment and detachment.
    • Default works for nearly all applications.

    • If unspecified, defaults to the absolute path of librocprofiler-sdk-rocattach.so.

  • attach_children: Optional - Specifies whether to attach to the target process and all of its descendant processes.
    • Defaults to True; pass False to attach only to the specified PID.

C Functions#

The C library librocprofiler-sdk-rocattach.so defines attach and detach functions that can be used for attachment:

extern "C" {
    // Attach to a process and all of its descendant processes
    rocattach_status_t rocattach_attach_tree(int pid) ROCATTACH_API;

    // Attach to a single process only
    rocattach_status_t rocattach_attach(int pid) ROCATTACH_API;

    // Detach from a process and all of its descendant processes
    rocattach_status_t rocattach_detach_tree(int pid) ROCATTACH_API;

    // Detach from a single process (or all sessions when pid=0)
    rocattach_status_t rocattach_detach(int pid) ROCATTACH_API;
}

Function Details:

  • rocattach_attach_tree(int pid): Attaches to a process and all of its descendants.
    • Enumerates the full process tree rooted at pid via /proc before attaching.

    • Attachment proceeds in breadth-first order from the root.

    • If attachment to an individual child process fails, the error is logged and attachment continues with the remaining processes; the return status reflects the last error seen.

    • The process tree is snapshotted at the time of the call; processes spawned after this point are not included.

  • rocattach_attach(int pid): Attaches to a single process only.
    • Takes the target process ID as parameter.

    • Doesn’t attach to child processes.

    • When profiling applications that spawn child processes, use rocattach_attach_tree instead.

  • rocattach_detach_tree(int pid): Detaches from a process and all of its descendants.
    • Enumerates the process tree rooted at pid via /proc at the time of the call.

    • Only processes with an active attachment session are detached; others are silently skipped.

    • Symmetric counterpart to rocattach_attach_tree; use these two together.

    • Reentrant: the sessions lock is acquired and released per-process and isn’t held across the /proc traversal, so concurrent calls from multiple threads are safe.

  • rocattach_detach(int pid): Detaches from a single process.
    • Takes the target process ID as a parameter.

    • Cleans up attachment resources and terminates profiling.

    • A PID of 0 can be specified to detach from all the current sessions.

Function call sequence#

Initial attachment sequence#

The initial attachment process roughly follows this sequence:

  1. rocattach_attach(pid) ← Your tool calls this

  2. ptrace calls rocprofiler_register_attach(env_buffer)

  3. tool_library::rocprofiler_configure(…)

  4. tool_library::rocprofiler_configure_attach(…)

  5. tool_library::tool_init(…)

  6. tool_library::tool_attach(…)

  7. [Profiling and data collection…]

  8. rocattach_detach(pid) ← Your tool calls this

  9. ptrace calls rocprofiler_register_detach()

  10. tool_library::tool_detach(…)

  11. [Program ends]

  12. tool_library::tool_fini(…)

Reattachment sequence#

For reattachment to a previously attached process:

  1. rocattach_attach(pid) ← Your tool calls this again

  2. ptrace calls rocprofiler_register_attach(env_buffer)

  3. tool_library::tool_attach(…)

  4. [Continued profiling and data collection…]

  5. rocattach_detach(pid) ← Your tool calls this

  6. ptrace calls rocprofiler_register_detach()

  7. tool_library::tool_detach(…)

Environment variable configuration#

This section lists the environment variables required for process attachment.

Required variables#

The target process must have ROCP_TOOL_ATTACH=1 set, or be using a version of rocprofiler-register configured with the CMake flag ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT=ON.

export ROCP_TOOL_ATTACH=1
OR
cmake /path/to/rocprofiler-register -DROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT=ON

Tool library configuration#

The attachment system can use any tool library. librocprofiler-sdk-tool.so is used when the environment variable is not set.

// Attachment libraries to be used
setenv("ROCPROF_ATTACH_TOOL_LIBRARY", "example-tool-1.so:example-tool-2.so", 1);

Using the attachment functions#

This is a simplified example of how to use these functions in your own attachment tool:

Basic attachment implementation#

#include <rocattach.h>

#include <dlfcn.h>
#include <iostream>
#include <thread>
#include <chrono>

class ROCprofilerAttachmentTool {
private:
    void* attach_lib_handle = nullptr;
    rocattach_status_t (*attach_func)(int) = nullptr;
    rocattach_status_t (*detach_func)(int) = nullptr;

public:
    bool initialize() {
        // Load the rocprofiler-attach library/binary
        attach_lib_handle = dlopen("librocprofiler-sdk-rocattach.so", RTLD_NOW);
        if (!attach_lib_handle) {
            std::cerr << "Failed to load librocprofiler-sdk-rocattach: " << dlerror() << std::endl;
            return false;
        }

        // Get the attachment function pointers.
        // Use rocattach_attach_tree/rocattach_detach_tree to attach to the process and all
        // its descendants, or rocattach_attach/rocattach_detach for a single process only.
        attach_func = (rocattach_status_t(*)(int))dlsym(attach_lib_handle, "rocattach_attach_tree");
        detach_func = (rocattach_status_t(*)(int))dlsym(attach_lib_handle, "rocattach_detach_tree");

        if (!attach_func || !detach_func) {
            std::cerr << "Failed to find attachment functions" << std::endl;
            return false;
        }

        return true;
    }

    bool attach_to_process(pid_t pid, uint32_t duration_ms) {
        // Validate the target process
        if (kill(pid, 0) != 0) {
            std::cerr << "Target process " << pid << " is not accessible" << std::endl;
            return false;
        }

        std::cout << "Attaching to process " << pid << std::endl;

        // Start attachment - this will handle all ptrace operations
        if (!attach_func(pid))
        {
            return false;
        }

        // Profile for specified duration
        std::cout << "Profiling for " << duration_ms << " milliseconds..." << std::endl;
        std::this_thread::sleep_for(std::chrono::milliseconds(duration_ms));

        // Stop profiling
        if (!detach_func(pid))
        {
            return false;
        }

        std::cout << "Profiling completed" << std::endl;
        return true;
    }

    ~ROCprofilerAttachmentTool() {
        if (attach_lib_handle) {
            dlclose(attach_lib_handle);
        }
    }
};

Main implementation#

#include <iostream>
#include <vector>
#include <string>
#include <cstdlib>

int main(int argc, char* argv[]) {
    if (argc < 2) {
        std::cerr << "Usage: " << argv[0] << " <PID> [duration_ms]" << std::endl;
        std::cerr << "  PID: Process ID to attach to" << std::endl;
        std::cerr << "  duration_ms: Optional profiling duration in milliseconds" << std::endl;
        return 1;
    }

    pid_t target_pid = std::stoi(argv[1]);
    uint32_t duration = (argc > 2) ? std::stoi(argv[2]) : 1000;

    // For this example, the tool library "librocprofiler-sdk-tool.so" is used by
    // default because ROCPROF_ATTACH_TOOL_LIBRARY is not set. These environment
    // variables are used to communicate profiling options to rocprofiler-sdk-tool.
    setenv("ROCPROF_HIP_RUNTIME_API_TRACE", "1", 1);
    setenv("ROCPROF_KERNEL_TRACE", "1", 1);
    setenv("ROCPROF_MEMORY_COPY_TRACE", "1", 1);
    setenv("ROCPROF_OUTPUT_PATH", "./attachment-output", 1);
    setenv("ROCPROF_OUTPUT_FILE_NAME", "attached_profile", 1);

    // Initialize and run attachment tool
    ROCprofilerAttachmentTool tool;
    if (!tool.initialize()) {
        std::cerr << "Failed to initialize attachment tool" << std::endl;
        return 1;
    }

    if (!tool.attach_to_process(target_pid, duration)) {
        std::cerr << "Attachment failed" << std::endl;
        return 1;
    }

    std::cout << "Attachment completed successfully" << std::endl;
    return 0;
}