Introduction
A block based motion estimation algorithm’s efficiency is determined by block size, number of blocks to search, and search window size. The objective of this project is to detect foreground objects and hence minimize the number of blocks to be searched between frames in order to present a computationally inexpensive motion estimation model, whilst computing an accurate set of motion vectors between frames. A high level representation of the algorithm is as follows:
1. Foreground object detection
2. Block Matching
1. Foreground object detection
2. Block Matching
1. Foreground object detection
Given a background frame, the first task to select the sets of blocks that could possibly represent spatial movement. These sets are the various foreground objects in the frame. A simple yet effective method is employed to implement the same. Initially we perform background subtraction. The easiest and computationally inexpensive way is to take the difference between the current frame and the background frame. A better approach would be to implement a mixture-of-gaussians approach, but at the cost of additional computational complexity. Since the algorithm makes use of a simple difference image, it is specially suited for video surveillance applications, where the background is relatively stationary.
We now create a mask that represents the pixels that vary across frames. All corresponding pixels are marked white, as displayed in Figure 2.
It is observable that due to noise there are singular pixels in the mask that are marked white, even though they are originally a part of the background. This noise is to be filtered out. A median filter is more desirable than a mean filter because an arithmetic mean performs a blur operation, which is not desirable in this case. Hence a median filter is applied to the mask, to yield Figure 3.
The last operation for object detection is to find contours and draw a bounding rectangle. But when that is implemented immediately after this step the, black patches within the foreground object are also detected. This is not the desired outcome. To prevent this, we implement two iterations of dilation and one iteration of erosion. This fills up the black patches within the foreground. The output of this step is shown in Figure 4.
We now find the contours of the object. OpenCV’s findContours() function tracks basic transitions from black to white and vice versa, and marks the contour locations accordingly. But the contours are not a closed curve. Hence we call approxPolyDP(), which helps link the various point to generate a closed contour. The function boundingRect() helps generate a rectangle surrounding the extremes of the contour.
We now normalize this rectangle to fit a grid of 8x8 blocks. The blocks contained in this grid are the only blocks to be searched in the next frame.
2. Block Matching
Block Matching is performed by the calcMinSAD() function. The function creates a search window of size 56x56 around the block’s original position, and searches for the best match block in the next frame (with a Raster Scan) that has minimum SAD. The function also ensures that if there are multiple blocks with the same minimum SAD, it selects the block with the least distance from the original position. Unfortunately it is impossible to calculate the endpoint error in the motion field because the ground truth is not available for the given image dataset. The resultant motion field is as follows.
3. Conclusion and Future Work
Thus object based selection of blocks to be searched helps speed up the process of motion estimation. The next key factor is to modify the search window size, based on previous information of the speed of movement of objects across frames. This will be implemented during my directed research under Dr. Nam Ling's guidance, at Santa Clara University.