Similar to image processing, low- and mid-level computer vision applications also allow data parallel processing. However, instead of operating in the same manner on the entire image usually only certain areas are of interest to the application. For an efficient computation it is important to detect them and skip the processing outside of the region of interest. But optimized parallel processing units are poor decision makers. Luckily, the regions of interest are in most cases not randomly distributed but rather spatially coherent. Graphics Processor Units (GPUs) are optimized for the processing of contiguous pixel regions and thus with some modifications we can use their processing power for many computer vision applications.
We apply different strategies concerning the focusing of the computation on certain image areas. One strategy is to include more pixels to the region of interest and thus increase the operation count in favor of spatially coherent regions and a regular memory access pattern. This turned out especially useful on older GPUs which were very poor at branching. The Generalized Hough Transform convolving an object's contour with a set of edges extracted from a scene has an irregular memory access pattern. By treating the detected edges as an entire image the operation count greatly increases but memory latency of random accesses is avoided. The additional space between the edges can be used to smoothly extend them and thus reduce the required number of contours to be tested (Fig. 1).
Often the region of interest is not global, i.e. different points require the consideration of different regions for processing. In case of the Distance Transform computation the computer has to check the distance to all points of a contour to determine the closest one. Most of these distance computations are unnecessary, since for each point a rough estimation can quickly deliver these parts of the contour which contain the closest point. But this produces data dependent regions of interest, as different points will require the examination of different parts of the contour. A parallel handling of these different regions of interest can be obtained by computing an upper bound on the minimal distance to the contour on a coarse version of the object boundary. These rough bounds can then be used to skip the processing of entire regions during the actual computation (Fig. 2).
Another strategy for the restriction of computation to certain areas is to process the entire image in tiles and skip processing for irrelevant tiles under some condition. Unlike the built-in mechanisms of GPUs the explicit tiling offers better control over the criteria when the processing should stop. For the purpose of motion estimation of humans, for example, it does not make sense to enforce the processing of a tile if only very few pixels contain relevant data. Thus one avoids to compute the motion of irregularities due to noise from the acquisition device or air convection (Fig. 3).
With the increasing support of GPUs for dynamic branching also more diverse parallel work loads will be suitable for hardware acceleration.
1. Generalized Hough Transform for object recognition computed in DirectX8 graphics hardware
2. Generalized Distance Transforms, skeletons and Voronoi diagrams computed in DirectX9 graphics hardware.
3. Motion estimation and feature visualization computed in DirectX9 graphics hardware.