This example demonstrates how to call global device functions (kernels) with launch bounds. For more information on this topic, please refer to the HIP documentation.
- A device buffer is allocated.
- A kernel is launched which consumes the device buffer.
- The host and the device are synchronized.
- The device memory is freed.
- Use
hipMallocto allocate memory in the global memory of the device (GPU). This is typically necessary because kernels running on the device cannot access host (CPU) memory, except for device-accessible pinned host memory (seehipHostMalloc). Note that the memory returned byhipMallocis uninitialized. - Use
hipFreeto deallocate device memory previously allocated withhipMalloc. It is important to free memory that is no longer in use to prevent resource leakage. - Use the triple chevron syntax
kernel_name<<<grid_dim, block_dim, shared_mem_bytes, stream>>>(parameters)to launch a kernel on the device. - Use
hipDeviceSynchronizeto synchronize the host and the device. This is a blocking call which only returns once all outstanding device operations have finished. - Use
hipGetErrorStringto convert a HIP error code into a human-readable string.
blockIdxblockDimthreadIdx
hipDeviceSynchronizehipFreehipGetErrorStringhipMalloc