I recently watched this interesting video by Shai Avidan and Ariel Shamir from MERL. They developed an extremely cool image resizing technique called seam carving. They explain all about it in this article.
One of the coolest things about their idea is that is it easy to implement. I implemented a semi-optimized version of their algorithm in a couple of hours of coding. All the images in this post were generated by my code.
The following image of a pagoda will assist me in demonstrating the ideas to follow:
First lets discuss the two image resizing methods currently used, namely scaling and cropping. Scaling consists of uniformly resizing the image. If for example we were to reduce an image 100 pixels wide to an image 50 pixels wide, we could simply average every two neighbouring pixels (a generalization of this technique can be applied even when the image’s new width does not divide the image’s original width, and of course to resizing both the width and the height of the image).
Scaling is efficient and, in some sense, you do not lose a lot of information from the original image. There are two big problems with scaling however:
- If the new dimensions of the image are not proportionate to the original dimensions, the scaled image is distorted.
- The method does not take into account the contents of the image at all (I will clarify this point shortly).
An example of the pagoda image scaled:
A second widely used form of image resizing is cropping. Cropping consists of simply selecting a sub image of the desired proportions from the original image – i.e. removing the borders of the image. An important question now arises: how do we select which box of the image to keep (or equivalently, how do we select which borders to remove)?
Lets say we assign a weight to each of the pixels of the image, representing its importance. Then selecting the cropping box amounts to simply selecting the box containing the pixels whose sum of weights is highest. We will refer to the weights of the pixels as the energy of the pixels. So optimal cropping would select the box with the highest energy.
Now, all we are left to do is assign the weights to the pixels. Following Avidan and Shamir, we can use simple gradient magnitude as our energy function. Gradient magnitude simply means calculating the gradient at each pixel (regarding the image as a function from R2 to R) and taking its magnitude at each pixel. Here is the result of applying gradient magnitude on the pagoda image:
The perceptive reader would have noticed that a color image is usually a function from R2 to R3 and not to R (as there are usually three independent color channels). There are several ways to calculate the gradient magnitude in this case. The one I used in my code is to calculate the magnitudes of the three gradients separately and average the results. Note that averaging before calculating the gradients is a bad idea as a lot of information is lost in the process!
Note that the gradient magnitude of a pixel constitutes some measure of how likely that pixel is to be a border pixel (and thus more important).
Using all of this, an optimal cropping of the pagoda image is shown here:
The cropping is optimal in the sense that this is the sub-image of the given dimensions containing the most energy.
Note that all the water from the bottom of the image, and the flowers from the top are missing, as well as the left side of the white house on the left of the pagoda.
Now, unlike the scaling method, the cropping technique just presented takes into account the contents of the image. But what if the image contained two pagodas, one at each side of the image? Cropping would only be able to select one of them.
So what we are looking for is some way to automatically remove the uninteresting parts in the middle of the image, and bring the interesting bits closer together.
A naive solution, similar to regular cropping, would be to remove the columns of the image with the least amount of energy not necessarily from the border. This technique causes severe artifacts.
Avidan’s and Shamir’s seam carving technique is an improvement of this. They define a vertical seam as an 8-connected path from top-to-bottom of the image containing one pixel in each row. 8-connectedness means that if pixel (x,y) is in the vertical seam, then exactly one of the pixels (x+1, y+1), (x, y+1) or (x-1, y) is in the vertical seam as well (unless of course y = the height of the image). A horizontal seam is defined similarly.
Now instead of deleting the column (row) with the least energy, they suggest deleting the vertical (horizontal) seam with the least energy. The seam with the least energy can be easily found using dynamic programming.
The first seam to be removed in the pagoda image is depicted below:
The process of reducing an image’s size via repeated deletions of the least important seams is called seam carving. The result of applying some seam carving to the pagoda image is shown here:
Note that although some artifacts are present in the image, it is the best among all the reduced images (at least – a mon avis).
A problem with seam carving, compared to scaling, is efficiency. I have some algorithmic ideas that I think might substantially reduce computation costs. I will let you know when I will have time to test them.
Another example is shown. The original image:
And finally, seam carving:
And just to convince you that the method doesn’t only work on pagodas:
And the carved version: