This example demonstrates 2D image convolution using HIP, implementing a box blur filter on images. The application uses the stb_image library for image loading and saving, making it easy to work with common image formats like JPEG and PNG.
For more information on HIP programming and stencil operations, please refer to the HIP documentation.
- An input image is loaded from disk using the stb_image library.
- A convolution mask (box blur filter) is initialized on the host.
- Device memory is allocated for the input image, output image, and convolution mask.
- The input image and mask are copied from host to device memory.
- A 2D grid of thread blocks is configured based on the image dimensions.
- The convolution kernel is launched on the GPU.
- Each thread processes one pixel across all color channels:
- Applies the convolution mask to the neighborhood around the pixel
- Handles boundary conditions with zero-padding
- Normalizes pixel values between 0-255
- The kernel launch is checked for errors and the device is synchronized.
- The processed output image is copied back from device to host memory.
- The output image is saved to disk in JPEG format.
- All device memory is freed.
The kernel implements 2D convolution with the following features:
- Parallel Processing: Each thread processes one pixel location
- Multi-channel Support: Handles RGB images by processing each channel independently
- Boundary Handling: Uses zero-padding for pixels near image edges
- Box Blur Filter: Applies a uniform averaging filter (33x33 default)
- Normalized Output: Maintains pixel values in valid 0-255 range
The box blur filter computes the average of all pixels in the mask region, creating a smoothing/blurring effect.
hipMalloc: Allocates device memoryhipMemcpy: Transfers data between host and devicehipFree: Frees device memoryhipGetLastError: Retrieves the last error from a runtime callhipDeviceSynchronize: Blocks until all device operations complete
__global__: Declares a kernel function callable from hostblockIdx,blockDim,threadIdx: Built-in variables for grid/block indexing- 2D thread indexing for image processing
The convolution operation is a classic stencil computation where each output element depends on a neighborhood of input elements. Key characteristics:
- Regular access pattern (structured grid)
- Halo region handling (boundary conditions)
- Data reuse opportunities (same input pixels used by multiple output pixels)
- Uses stb_image.h for loading images (JPEG, PNG, BMP, etc.)
- Uses stb_image_write.h for saving images
- Processes images in row-major order with interleaved color channels
- Default input:
test.jpg - Default output:
test_out.jpg - Default mask size: 33x33 (box blur)
- Block size: 16x16 threads
- Command line usage:
./hip_image_convolution [input.jpg] [output.jpg]
Potential optimizations for this algorithm:
- Use shared memory to cache frequently accessed pixels
- Separate kernels for different color channels to improve memory coalescing
- Use texture memory for automatic caching and filtering
- Implement separable convolution for larger kernels (two 1D passes instead of one 2D pass)
blockDimblockIdxthreadIdx
hipDeviceSynchronizehipFreehipGetLastErrorhipMallochipMemcpyhipMemcpyHostToDevicehipMemcpyDeviceToHost
stb_image.h: Image loading (supports JPEG, PNG, BMP, TGA, etc.)stb_image_write.h: Image saving (JPEG, PNG, BMP, TGA)