This simple program showcases the usage of the device function rocprim::reduce.
- Input data is instantiated in a
std::vector<int>and the values are printed to the standard output. - Device storage for input and output data is allocated using
hipMalloc. - Input data is copied from the host to the device using
hipMemcpy. - The binary operator used in the reduction is instantiated. This example calculates the sum of the elements of the input vector, hence
rocprim::plus<int>is the appropriate choice. - The amount of working memory needed by the reduction algorithm is calculated by a first call to
rocprim::reduce. For the first argument, anullptris passed, thereby the function calculates the value oftemp_storage_bytesand returns without launching the GPU kernel. temp_storage_bytesamount of memory is allocated on the device.- A subsequent call to
rocprim::reduceis made, this time passing the pointer to the working memory. This launches the GPU kernel that performs the calculation. - The result of the summation is copied back to the host and is printed to the standard output.
- All device memory is freed using
hipFree.
- rocPRIM provides HIP parallel primitives on multiple levels of the GPU programming model. This example showcases
rocprim::reducewhich is a device function, thereby it can be called from host code. - The
rocprim::reducetemplate function performs a generalized reduction, i.e. it combines a vector of values to a single value using the provided binary operator. Since the order of execution is not determined, the provided operator must be associative. In the example, an addition (rocprim::plus<int>) is used which fulfils this property. - The device functions of
rocPRIMrequire a temporary device memory location to store the results of intermediate calculations. The required amount of temporary storage can be calculated by invoking the function with matching argument set, except the first argumenttemporary_storagemust be anullptr. In this case, the GPU kernel is not launched.
rocprim::reducerocprim::plus
hipMallochipMemcpyhipFree