In 1998 (I can’t believe it was so long ago!) I was working at MATE (Media Access Technologies), an Israeli start-up at the time, specialising in face recognition and video searching. It was then that I came to know one of the coolest tricks in image recognition - the Hough Transform (pronounced “huf transform“).
In this article, I will demonstrate the algorithm with the following example:
Say you want to write a program that watches soccer. The input to your program is a sequence of frames from the game. The output should be the locations of all the players at all times. Given a frame of soccer, you first need to locate the players in the frame (i.e. you should know that a certain player occupies a certain set of pixels). Next, you need to know what location in the world is represented by those pixels. In order to do that, you need to figure out where the camera is looking at. In this article I will focus on this part of the problem (understanding the view point and direction). I will not deal with the (easier) part of determining what pixels are occupied by players.
Consider the following image of Ronaldinho:
I manually transformed it to this:
(the portion of the image covered by grass). I think it is clear how this step can be automatized.
Next I applied an automatic algorithm that got me this:
(the same image converted to black and white, for all the pixels that were bright enough). My algorithm simply turned all the pixels that are brighter than the color (80,80,20) () on a per channel basis to white, and all the other pixels to black. Note that my threshold color requires a high value of the red component, which is not present in the green grass, in order to turn the grass itself to black.
Lets return to our original problem – determining the point and direction of view (POV). We have big clues in the picture for determining the POV, namely the lines on the ground. Note that from the lines on the ground in the black and white picture, it is easy to see that Ronaldinho is standing in the center of the field. Can we somehow detect them?
Lets formulate the following abstact problem:
You are given a matrix M over Z2 (i.e. a matrix of bits). The matrix represents an image. Pixel Pij of the image is white if Mij is 0 and black if Mijis 1. The image contains straight lines. The lines are not perfect (i.e. they may overlap, they may not be continuous and there is a lot of noise). Your goal is to develop an efficient algorithm that receives the matrix and outputs the line equations.
Please take some time to think about this before continuing to read. If you spend enough time thinking about it, you will probably reach the conclusion that this is a really though problem.
A solution to this problem is the Hough Transform. The Hough Transform consists of noticing a duality of the Euclidean plane. We Consider a the straight line equation:
y = m * x + b
In this equation m and b are constants and x and y are variables. But what happens if we make x and y constants and m and b variables?
Well, the set of m’s and b’s that satisfy the equation represents all the lines that pass through our point (x, y)!
It is easy to see that for each point in the x-y plane we get a line in the b-m plane, namely, the line:
b = -x * m + y
Now each line in the x-y plane is matched with a point in the b-m plane! The duality depicted:
So the algorithm basically says:
- Run through the original matrix.
- For each pixel (x,y) that is white draw the line that matches it in the dual plane.
- Then count the number of line intersections in the dual plane.
- All points at which there is an intersection of more than T lines (these correspond to lines in the x-y plane on which we have more than T points in our original image) are considered valid lines.
To better illustrate the algorithm, I jotted down a few lines of python. The results follow (I must say that I was really surprised that it worked perfectly the first time around! All the threshold values I picked were somehow correct on the first go ).
Note that there is an implementation problem with using the b-m plane - Precision issues. For example, a near vertical line has huge values of m. In order to solve this, I used another duality (which is completely equivalent): the duality of the x-y plane and the a-c plane, where a is the angle of the line with the x axis (i.e. m = tan(a)), and c is b*cos(a).
The Hough map (i.e. the result of the Hough Transform) my script produced is:
Since it is kind of hard to distinguish the different color values, I generated a more human-friendly version (all the values are multiplied by 15):
And finally, the result (after converting back to the x-y plane):
And the same result overlayed on the original image:
A perfect match!
This is a big subject, and I only covered a small part of it. I will update this article in the near future with a more detailed analysis. If you have specific questions please post them as comments and I promise I will answer!
Meanwhile, let me just point out that the same method can be used to detect other shapes (e.g. circles). The generalization is trivial.
I will finish by showing the results of running the script on another image. Note that in this case, the threshold values are a little off. The results are nevertheless impressive:
The black and white image generated was:
The Hough map:
And, finally, the overlayed result:
To be updated…