Archive for August, 2007

Seam Carving

Monday, August 27th, 2007

carved_pagoda.jpg I recently watched this interesting video by Shai Avidan and Ariel Shamir from MERL. They developed an extremely cool image resizing technique called seam carving. They explain all about it in this article.

One of the coolest things about their idea is that is it easy to implement. I implemented a semi-optimized version of their algorithm in a couple of hours of coding. All the images in this post were generated by my code.

The following image of a pagoda will assist me in demonstrating the ideas to follow:

pagoda.jpg

First lets discuss the two image resizing methods currently used, namely scaling and cropping. Scaling consists of uniformly resizing the image. If for example we were to reduce an image 100 pixels wide to an image 50 pixels wide, we could simply average every two neighbouring pixels (a generalization of this technique can be applied even when the image’s new width does not divide the image’s original width, and of course to resizing both the width and the height of the image).

Scaling is efficient and, in some sense, you do not lose a lot of information from the original image. There are two big problems with scaling however:

  • If the new dimensions of the image are not proportionate to the original dimensions, the scaled image is distorted.
  • The method does not take into account the contents of the image at all (I will clarify this point shortly).

An example of the pagoda image scaled:

scaled_pagoda.jpg

A second widely used form of image resizing is cropping. Cropping consists of simply selecting a sub image of the desired proportions from the original image – i.e. removing the borders of the image. An important question now arises: how do we select which box of the image to keep (or equivalently, how do we select which borders to remove)?

Lets say we assign a weight to each of the pixels of the image, representing its importance. Then selecting the cropping box amounts to simply selecting the box containing the pixels whose sum of weights is highest. We will refer to the weights of the pixels as the energy of the pixels. So optimal cropping would select the box with the highest energy.

Now, all we are left to do is assign the weights to the pixels. Following Avidan and Shamir, we can use simple gradient magnitude as our energy function. Gradient magnitude simply means calculating the gradient at each pixel (regarding the image as a function from R2 to R) and taking its magnitude at each pixel. Here is the result of applying gradient magnitude on the pagoda image:

grad_mag.JPG

The perceptive reader would have noticed that a color image is usually a function from R2 to R3 and not to R (as there are usually three independent color channels). There are several ways to calculate the gradient magnitude in this case. The one I used in my code is to calculate the magnitudes of the three gradients separately and average the results. Note that averaging before calculating the gradients is a bad idea as a lot of information is lost in the process!

Note that the gradient magnitude of a pixel constitutes some measure of how likely that pixel is to be a border pixel (and thus more important).

Using all of this, an optimal cropping of the pagoda image is shown here:

croped_pagoda.jpg

The cropping is optimal in the sense that this is the sub-image of the given dimensions containing the most energy.

Note that all the water from the bottom of the image, and the flowers from the top are missing, as well as the left side of the white house on the left of the pagoda.

Now, unlike the scaling method, the cropping technique just presented takes into account the contents of the image. But what if the image contained two pagodas, one at each side of the image? Cropping would only be able to select one of them.

So what we are looking for is some way to automatically remove the uninteresting parts in the middle of the image, and bring the interesting bits closer together.

A naive solution, similar to regular cropping, would be to remove the columns of the image with the least amount of energy not necessarily from the border. This technique causes severe artifacts.

Avidan’s and Shamir’s seam carving technique is an improvement of this. They define a vertical seam as an 8-connected path from top-to-bottom of the image containing one pixel in each row. 8-connectedness means that if pixel (x,y) is in the vertical seam, then exactly one of the pixels (x+1, y+1), (x, y+1) or (x-1, y) is in the vertical seam as well (unless of course y = the height of the image). A horizontal seam is defined similarly.

Now instead of deleting the column (row) with the least energy, they suggest deleting the vertical (horizontal) seam with the least energy. The seam with the least energy can be easily found using dynamic programming.

The first seam to be removed in the pagoda image is depicted below:

seam.JPG

The process of reducing an image’s size via repeated deletions of the least important seams is called seam carving. The result of applying some seam carving to the pagoda image is shown here:

carved_pagoda.jpg

Note that although some artifacts are present in the image, it is the best among all the reduced images (at least – a mon avis).

A problem with seam carving, compared to scaling, is efficiency. I have some algorithmic ideas that I think might substantially reduce computation costs. I will let you know when I will have time to test them.

Another example is shown. The original image:

pagodas_small.jpg

Optimal cropping:

pagodas_small_cropped.jpg

Scaling:

pagodas_small_scaled.jpg

And finally, seam carving:

pagodas_small_carved.jpg

And just to convince you that the method doesn’t only work on pagodas:

hopper_landscape.jpg

And the carved version:

hopper_landscape_carved.jpg

Piece of Cake

Thursday, August 23rd, 2007

In this article you will find a collection of riddles. They are all either very well known or extremely easy. Enjoy!

Cutting the Cake

cake1.jpgHow can you cut a circular cake to eight even pieces with 3 cuts? What is the maximum number of pieces possible with 4 cuts?

Colorblind Pills

red_or_blue_pill.jpgYou have a rare and fatal disease. The only cure for your disease is taking one Biaxin pill (which is blue) and one Robaxin pill (which is red) exactly at midnight, for two days. I.e. you have to take one of each today at midnight and one of each tomorrow at midnight, for a total of four pills. You have exactly four pills (two of each kind) at your disposal. The pills, apart from being in different colors, are identical in appearance.

Now, in addition to your fatal illness, you are also colorblind. How can you save yourself?

Note that in order to be saved you have to take the pills exactly as described above – e.g. taking all four pills at the same time will get you killed!

And no - there aren’t other people that can help you, etc…

Cup of Coffee, Fuck of Tea 

glass-of-milk.jpgYou have a cup of coffee and a cup of milk, both containing the same (unspecified) amount of liquid. You take one spoon full of milk and put it in the coffee. You mix well. You then return a spoon full of the mixture to the cup of milk. Which is more: milk in coffee or coffee in milk?

The title is taken from the famous Israeli movie Operation Grandma. Thanks to Misha Seltzer for this riddle.

The Virgin Islands

british_virgin_islands_property.jpg You are shipwrecked on the Virgin Islands (since Chuck Norris visited them, they are known simply as The Islands). You know that there are three tribes in the Islands: The Liatu tribe, that always lies, the Honestu tribe, that always tells the truth, and the Randomu tribe that answers randomly to every question asked. As you wander around the Islands, you come across 3 tribesmen. By the way they are dressed you can tell that they belong to different tribes (i.e. there is one member of each tribe) but you do not know to which tribe they belong. The native language on the Islands is French, which you speak fluently. It is well known that in French you can only ask “yes-no” questions. You are also familiar with an ancient tradition on the Islands, that says that it is impolite to ask more than 2 questions (each addressed to one specific person) per conversation, no más. You are standing at a fork and need to decide whether to go left or right. How can you get your information?

Maximal Partitions

Friday, August 17th, 2007

When considering the extra-credit of the 23 and 2000 riddle, I thought of an additional interesting problem.

Let N be a positive integer. A partition of N into m parts (an m-partition of N) is a multiset of m positive integers, such that their sum equals N. A multiset is a set which may contain repeated elements.

So if N is 5, then the multiset {1, 1, 1, 2} is a 4-partition of N.

Now, for a multiset A of numbers, define the mul of A, denoted mul(A), as the product of all the numbers. So mul({1, 1, 1, 2}) = 2.

Now, let N be a positive integer. An m-partition P of N is called a maximal partition of N,  if for all positive integers k, and all k-partitions of N, Pk, mul(P) >= mul(Pk).

E.g. the 2-partition {2, 3} is a maximal partition of 5 (its the only one). Note that as the number of partitions is finite there always is at least one maximal partition.

Finally, define maxmul(N) to be the mul of a maximal partition of N.

Now, we finally got to the point: given a positive integer N, what is maxmul(N)?

If you want to think about this one yourself, stop reading now, because the rest of this post discusses the solution. Anyway, try to have at least an initial intuition on the answer before reading on.

I must admit that my initial intuition about this problem was that if N is even, then a maximal partition of N will be a N/2-partition of the form {2, 2, …, 2}. This is wrong.

As it turns out, for all N > 1, the maximal partition of N depends on the value of N modulo 3.

If N = 0 (3) then the maximal partition is of the form {3, 3, …, 3}.

If N = 1 (3) then the maximal partition is of the form {3, 3, …, 3, 2, 2}.

If N = 2 (3) then the maximal partition is of the form {3, 3, …, 3, 2}.

I leave the reader the details of the proof. It is not too complicated using induction. I will now try to explain the reason behind this fact.

Notice a weird fact about the number 3. It is, in some way, the densest integer. Why is that so?

As you might have guessed, this is because 3 is the integer closest to e (=~2.718).

Lets consider the continuous equivalent of this problem. Say that N is a real-number. For each integer m, define an m-partition of Nas a multiset of m positive real numbers, whose sum is N. Note that now the number of partitions is infinite (for each m>1, even the number of m-partitionsis infinite) and so we are no longer guaranteed of the existence of a maximal partition (although it is easy to show that it indeed exists).

Using some basic analytic tools (such as Lagrange multipliers) it is easy to show that given m, the partition whose mul is maximal is the partition {N/m, …, N/m}.

So now we are left with finding an integer m such that the expression (N/m)^mis maximal. Again, lets consider the continuous equivalent of this. For each 1<x<∞, define f(x) = (N/x)^x. Using elementary calculus, it is clear that this function has a single maximum for x = N/e. A graph of the function f(x) is shown here (the blue line indicates e):

maximal_partitions.GIF

Using x = N/e gives a partition with a constant chunk size of e.

This explains (at least intuitively) the reason for 3 being the densest integer.

Extra credit

  1. How many partitions does a positive integer N have?
  2. How big is the set {mul(P) | P is a partition of N}?

Note that the definitions used in this post were invented by me and are not universally known (i.e. do not be alarmed if someone does not understand the meaning of the term maximal partition!).

23 and 2000

Monday, August 13th, 2007

easyriddle.gifYou are given 23 whole numbers, not necessarily distinct, in a row.

You cannot change the order of the numbers.

Prove that there exists an arrangement of the symbols ’+’, ‘×’, ‘(‘ and ‘)’ in-between the 23 numbers, such that the final result is a valid formula, whose evaluated value equals 0 mod 2000.

Extra Credit 

  1. Is 23 a tight bound? Can you find a sequence of 22 numbers such that all arrangements of the symbols ‘+’, ‘×’, ‘(‘ and ‘)’ in-between them will result in numbers that are different from 0 mod 2000? I haven’t thought about this one yet, so please post your ideas!
  2. Consider a more general case. Replace in the riddle above the number 23 by K and the number 2000 by N. Describe all the pairs, K, N, for which a solution to the riddle exists.

Thanks to Misha Seltzer, for sending me this cool riddle!

SIMD (Fire)Works!

Sunday, August 12th, 2007

When I started yaniv.leviathanonline.com, I declared it to be a site about “Mathematics and Hacking”. As of now, most of its content is Mathematical. I thought it was time for a little change…

Those of you with no interest in assembly or low-level programming are welcome to skip this post entirely, although expanding your horizons is never a bad idea!

A few years ago I gave a week-long x86 assembly course for some fellow programmers. As I wanted them to learn how to use the Pentium’s SIMD extensions, I asked them to program a simple fireworks demo and try to apply some filters to it using MMX. In this article I will detail my solution to the SIMD fireworks exercise.

The following paragraph is intended for those of you without assembly knowledge. All of you assembly masters are welcome to skip to the fun stuff below!

Some References 

Well, for starters I recommend reading the ”The Art of Assembly Language Programming” (AoA) which covers a lot of the basics. And yes – Assembly Programming isan Art (maybe more so than programming in other languages). Next, you should also read Intel’s manuals: Volume 1: Basic Architecture, Volume 2A: Instruction Set Reference, A-M, Volume 2B: Instruction Set Reference, N-Z, Volume 3A: System Programming Guide, Volume 3B: System Programming Guide and Architectures Optimization Reference Manual. Yes, I know its a lot of reading but they are really interesting and you don’t have to read the whole thing in order (although I recommend it – so many interesting facts lye in those pages). Finally, if you are going to develop in assembly (e.g. assemble the source included in this article) I recommend using NASM, as it is the simplest and cleanest assembler (although some people prefer MASM, which is kind of similar, but a bit more clumsy). If you are going to do some 16-bit DOS programming, you will find Ralf Brown’s Interrupt List (RBIL) a useful tool.

Throughout this article I will assume basic assembly knowledge (the first chapters of AoA will more than cover everything).

The Buzzwords

SIMD stands for Single Instruction Multiple Data.  It is clear that most algorithms require repeating the same operation over and over again (each time on different sets of data). From this observation stemmed the idea of a vector computer or SIMD. We will shortly see that indeed our fireworks program will repeat the same operations many times and so we will be using the Pentium’s SIMD capabilities for that purpose.

MMX was Intel’s first attempt at bringing SIMD capabilities to the Pentium. MMX does not stand for anything, but it has been offered several meanings since its appearance (such as Multimedia Extensions). MMX adds 8 new registers (mm0,…,mm7) to the CPU. These are 64-bit registers. The main advantage of these registers (compared to the x86 general purpose registers) is that the MMX registers can function as vector registers. This means that each of these registers can represent one 64-bit integer value, two 32-bit integer values, four 16-bit integer values or eight 8-bit integer values. For example, the command:

psubusb mm0, mm1

is essentially equivalent to 8 byte subtractions (it stands for Packed SUBtract Unsigned Saturated Byte).

Note that in order to be backward compatible with existing OS’s (mainly Windows) Intel aliased the MMX registers to the FPU registers. This means that when accessing one of them the contents of the other one are altered as well. This allowed the OS’s to keep using their old context switching mechanisms.

SSE stands for Streaming SIMD Extensions. This was Intel’s second attempt at bringing SIMD to the PC (in response to AMD’s 3DNow!). SSE added many instructions to the instruction set as well as 16 128-bit vector registers (xmm0,…,xmm15). Unlike the 8 MMX registers, the SSE registers can contain four 32-bit floating point values. As these registers are not aliased to any existing registers (and so require special care by the OS) they are disabled by default and are only turned on when the OS explicitly commands it.

Mode 13h

The easiest to program DOS graphics mode is probably mode 13h, which we shall therefore use. In mode 13h, the screen is represented as a 320*200 byte buffer (located in address 0xA000). Each byte in the buffer is an index to a 256 bytes long palette. When mode 13h is enabled, simply writing bytes memory in the range 0xA000-0x19A00 (0x19A0 = 0xA000 + 0xFA00, 0xFA00 = 320*200) will draw pixels on the screen. You should not draw directly to this buffer though, as when the video card is drawing the screen, it scans through this buffer, thereby stalling your write commands. What you should do, is allocate an off-screen buffer and render to this buffer and once in a frame spill the contents of this buffer as quickly as possible to the memory block starting at 0xA000. We will get to this later.

Entering mode 13h is very easy, and is done by calling BIOS function 0×10, as such:

mov ax, 0x13
int 0x10

Calling BIOS functions is done through raising an interrupt. The parameters are passed in the registers (for more information see RBIL).

The default video mode for DOS is mode 3h. It is the mode of choice for displaying text. So upon program exit we should restore the video mode to mode 3h, as such:

mov ax, 3
int 0x10

The Palette 

Now, in order to render properly colored fireworks, we need to have a decent color palette. In mode 13h the palette is set by outputting the palette index number to port 0x3c8 and then outputting 3 color values (for the red, green and blue components) to port 0x3c9. Note that if the entire palette is to be set in one go, only index 0 needs to be outputted to port 0x3c8, as it is incremented automatically after three writes to port 0x3c9. Each of the 3 color components is represented by a 7 bit value, it is a number in the range 0-63, where a value of 0 means that the component is not present at all in the color, and a value of 63 means that the component is present with full opacity in the color. So the color pure green will be represented by the tuple (0,63,0). The palette we shall use will consist of a simple linear interpolation from black to red to white. Index 0 will be total black, index 128 will be total red and index 255 will be total white.

A simple program that sets such a palette and then draws it to the screen can be found here (its source is here) Our fireworks program will use most of this code. Its output is:

palette.JPG

The Basic Fireworks Program

Enough with the introductions – lets get to business!

The design for the fireworks program is very simple. We will manage a buffer of particles, each having the following qualities: x, y, x_speed, y_speed and color (although this version uses a constant white color for all particles, which looks better). Each particle’s position will be updated according to its speed, which will in turn be updated according to gravitation.

The full source of the program can be found here. The program itself can be found here. It begins with some defines (you can play with the number of particles by changing the NUM_PARTS define). I then allocate a buffer big enough to contain a copy of the entire screen (we will draw to this buffer, so as to avoid the stalling inherent in writing to the video memory itself). We access the buffer though the es register.

We then enter mode 13h, set the palette and zero the buffer.

We then use a little very useful trick. We call the RDTSC instruction, which reads the CPU’s 64-bit time-stamp register (this time-stamp register is zeroed upon CPU initialization – i.e. when you turn on your computer – and is incremented after every instruction! What do you say, is there a chance of overflow?) into registers EDX:EAX. As NASM does not support the RDTSC instruction I coded it as:

db 0x0f, 0x31

This trick is very useful, as when writing C++ programs for the PC, it can be used to generate very accurate time-stamps very efficiently. The following code will do the trick in Visual Studio:


int RDTSC()
{
     __asm _emit 0x0f;
     __asm _emit 0x31;
}

Lets get back to the fireworks program. We use this trick to generate a random seed for the random number generator (RNG). This RNG will later be used to initialize the particles.

Next, is the init_particle function, which as its name implies, initializes a particle’s properties. In order to make the particles look like fireworks I initialize the position of bunches of particles to the same value (using the global_x and global_y variables). The init_particle function accepts a single argument (a pointer to a particle) passed through the bp register.

The MAIN function updates the particles (and calls init_particle when needed) and then draws the particles to our off-screen buffer.

After all of the particles are drawn, the MAIN function calls two functions which apply filters on our buffer, before displaying it on the screen (by copying our video buffer to address 0xA000).

The Filters 

These filters are very important (and are implemented using MMX!). There are three: mmx_shade, mmx_blur and mmx_blur_right.

The particles are drawn by the MAIN function with a solid white color (i.e. an index of 255). Instead of wiping the screen after each frame, we call the mmx_shade function which decreases the value of all the pixels on the screen by 1. Its code is very simple:


mmx_shade:
mov cx, 320*200/8
xor di, di
movq mm1, [sub_mask]
.lop:
 movq mm0, [es:di]
 psubusb mm0, mm1
 movq [es:di], mm0
 add di, 8
 loop .lop
ret

Where,

sub_mask: dd 0x01010101, 0x01010101

Note how simple this is! First of all we gain a big speed factor as we are processing 8 pixels in each iteration. Even more importantly we use the extremely convenient saturated subtraction instruction. Saturated subtraction means that if the result of the subtraction is negative, then instead of getting the value mod 256, we get 0!

Running the fireworks program with only the mmx_shade filter on, results in this:

shade_only.JPG

Not very impressive yet, but it does look like fireworks! (at least when animated…)

In order to make our program much nicer, we use two more blurring filters. The mmx_blur function averages the value of all pixels with that of their 4 immediate neighbours (giving twice more weight to the value already in the current pixel). Its code is:

mmx_blur:
mov cx, (320*200-330*2)/8
mov di, 328
.lop:
 movq mm0, [es:di]
 movq mm1, [es:di+1]
 movq mm2, [es:di-1]
 movq mm3, mm0
 movq mm4, [es:di-320]
 movq mm5, [es:di+320]
 pavgb mm0, mm1
 pavgb mm3, mm2
 pavgb mm4, mm5
 pavgb mm3, mm4
 pavgb mm0, mm3
 movq [es:di], mm0
 add di, 8
 loop .lop
ret

Notice that we skip the first and last lines (plus several extra bytes) in order to keep the function as simple as possible. Those of you with sharper eyes will observe a small issue with this function. In order to properly perform these averages, we need to use a third buffer. As we are scanning through our buffer, we use the results of previous blurs as inputs to the following ones, instead of using the original contents of the buffer. This is not very noticeable, and as it saves us the need to allocate a third buffer it is worth it.

Running the program with the mmx_blur filter applied, as well as the mmx_shade filter, results in this:

mmx_blur.JPG

Much nicer! Finally we have the mmx_blur_right function, which is a slight variation of mmx_blur, that gives the fireworks comet like appearance. Its code is self-explanatory:


mmx_blur_right:
mov cx, (320*200-330*2)/8
mov di, 328
.lop:
 movq mm0, [es:di]
 movq mm1, [es:di+1]
 movq mm2, [es:di+320]
 movq mm3, [es:di+321]
 pavgb mm0, mm1
 pavgb mm3, mm2
 pavgb mm0, mm3
 movq [es:di], mm0
 add di, 8
 loop .lop
ret

The MAIN program switches between the two blur functions once in a while. The result of applying the mmx_blur_right function is:

mmx_blur_right.JPG

Exercises to the Reader

  1. Change the three filter functions so that they use the 16-byte SSE registers, thus gaining a factor of 2 on these functions’ running time. This is very easy!
  2. Add another offscreen buffer and correctly implement the blurring.
  3. Make the program shorter/cleaner/nicer. Play with the parameters. Change the palette. Enjoy!

I hope that you now have a decent understanding of SIMD and how to use it when programming the Pentium!

PS – in case you want to assemble the source use the following command:

nasmw.exe fireworks.asm -o fireworks.com