This example demonstrates how to use the rocProfiler SDK for asynchronous, device-level hardware counter collection. Unlike dispatch-specific profiling, this method samples counters periodically on a separate thread. The application configures a device counting service, which runs in the background and collects data from a specified GPU agent.
- Tool Loading and Initialization:
- The rocProfiler runtime loads the tool's shared library and calls the
rocprofiler_configureentry point. - The tool registers its
tool_initandtool_finifunctions. - The
tool_initfunction creates a rocProfiler context, a buffer for the counter data, and discovers the first available GPU agent.
- The rocProfiler runtime loads the tool's shared library and calls the
- Profile and Service Configuration:
- A counter profile for
SQ_WAVESis built and cached for the selected agent. - The
rocprofiler_configure_device_counting_serviceis configured. It uses aset_profilecallback to provide the counter configuration to the service.
- A counter profile for
- Asynchronous Sampling:
- A detached thread is launched, which starts the rocProfiler context.
- Inside this thread,
rocprofiler_sample_device_counting_serviceis called in a loop to trigger counter collection periodically (every 50ms).
- Workload Execution:
- Concurrently, the
mainfunction inmain.cpplaunches a series of HIP kernels (kernel_a,kernel_b,kernel_c) in a loop to generate GPU activity.
- Concurrently, the
- Data Collection and Processing:
- The sampling thread collects the counter data and stores it in the buffer.
- When the buffer's watermark is reached, the
buffered_callbackis invoked to process and print the counter values.
- Finalization:
- After the workload is complete, the
tool_finifunction signals the sampling thread to stop, stops the context, and flushes the buffer to ensure all data is processed.
- After the workload is complete, the
-
Device Counting Service:
rocprofiler_configure_device_counting_service(): Configures a service for profiling an entire GPU device asynchronously.- The service is configured for a single
rocprofiler_agent_id_t, allowing for targeted profiling of one GPU.
-
Asynchronous Sampling:
rocprofiler_sample_device_counting_service(): Explicitly triggers a counter collection sample. In this example, it is called from a dedicated thread to perform periodic, time-based monitoring, independent of kernel dispatches.
-
Profile Callback:
- A
set_profilecallback is used by the service to request therocprofiler_counter_config_id_tfor the targeted agent.
- A
-
Buffer and Callback:
- A
rocprofiler_buffer_id_tis used to store the sampled counter data, which is then processed by abuffered_callback.
- A
rocprofiler_assign_callback_threadrocprofiler_configure_device_counting_servicerocprofiler_create_bufferrocprofiler_create_callback_threadrocprofiler_create_contextrocprofiler_create_counter_configrocprofiler_flush_bufferrocprofiler_iterate_agent_supported_countersrocprofiler_query_available_agentsrocprofiler_query_counter_inforocprofiler_sample_device_counting_servicerocprofiler_start_contextrocprofiler_stop_context
hipDeviceResethipDeviceSynchronizehipFreehipGetDeviceCounthipGetDevicePropertieshipLaunchKernelGGLhipMallochipMemcpyhipSetDevice
rocprofiler_buffer_id_trocprofiler_callback_thread_trocprofiler_context_id_trocprofiler_counter_config_id_tROCPROFILER_BUFFER_CATEGORY_COUNTERSROCPROFILER_BUFFER_POLICY_LOSSLESSROCPROFILER_COUNTER_RECORD_VALUE