This example demonstrates how to use the rocProfiler SDK's PC (Program Counter) sampling service. PC sampling periodically interrupts executing GPU wavefronts to record their program counter and other hardware state. This provides a statistical profile of where time is being spent within a kernel, which is invaluable for identifying performance bottlenecks. The tool queries for agents that support PC sampling, configures the service, and processes the resulting sample records.
- Tool Loading and Initialization:
- The rocProfiler runtime loads the tool's shared library and calls the
rocprofiler_configureentry point. - The tool registers its
tool_initandtool_finifunctions. - The
tool_initfunction discovers all GPU agents that support PC sampling and queries their available configurations.
- The rocProfiler runtime loads the tool's shared library and calls the
- Context, Buffer, and Service Configuration:
- A single rocProfiler context is created for the tool.
- A separate buffer is created for each supported GPU agent.
- For each agent, the PC sampling service is configured, preferably with the stochastic method.
- Context Activation:
- The rocProfiler context is started, which activates the PC sampling service on all configured agents.
- Workload Execution:
- The
mainfunction inmain.cpplaunches a multi-threaded workload with various kernels to generate GPU activity.
- The
- Data Collection and Processing:
- As the PC sampling hardware generates data, it is collected into the agent-specific buffers.
- When a buffer's watermark is reached, the
rocprofiler_pc_sampling_callbackis invoked. - The callback processes the records and prints detailed information, including the program counter, workgroup ID, and hardware-specific stall information.
- Finalization:
- After the workload is complete, the
tool_finifunction is called, which stops the context, flushes and destroys the buffers, and cleans up resources.
- After the workload is complete, the
-
PC Sampling Service:
rocprofiler_configure_pc_sampling_service(): The core of this example. This service configures the hardware to perform PC sampling on a given agent.
-
Sampling Methods:
- The tool demonstrates how to query for and select between different sampling methods:
ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC: A low-overhead method where the hardware randomly samples wavefronts.ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP: A method that involves the host, which can have higher overhead.
- The tool demonstrates how to query for and select between different sampling methods:
-
Configuration Query:
rocprofiler_query_pc_sampling_agent_configurations(): Used to discover the supported sampling methods and their valid interval ranges for a given agent.
-
PC Sample Records:
- The callback processes different types of PC sampling records, such as
rocprofiler_pc_sampling_record_stochastic_v0_t, which contains detailed information about the wavefront state, including whether an instruction was issued and, if not, the reason for the stall.
- The callback processes different types of PC sampling records, such as
rocprofiler_assign_callback_threadrocprofiler_configure_pc_sampling_servicerocprofiler_create_bufferrocprofiler_create_callback_threadrocprofiler_create_contextrocprofiler_destroy_bufferrocprofiler_flush_bufferrocprofiler_query_available_agentsrocprofiler_query_pc_sampling_agent_configurationsrocprofiler_start_context
hipDeviceResethipDeviceSynchronizehipFreehipGetDeviceCounthipHostRegisterhipHostUnregisterhipLaunchKernelGGLhipMallochipMemcpyAsynchipMemsetAsynchipSetDevicehipStreamCreatehipStreamDestroyhipStreamSynchronize
rocprofiler_buffer_id_trocprofiler_callback_thread_trocprofiler_context_id_trocprofiler_pc_sampling_record_header_t