I just learned this cool bit manipulation staff yesterday and I am dying to write it down. Here it is –
Let’s write a program to decide if a number is a power of 2.
The above code executes at least 31 times on a 32 bit or 64 bit Linux machine.
Can it be made better? Can a number be checked whether it’s power of 2 in less number of instructions?
Let’s park that question for a bit and let’s talk about number representations. The computers represent the numbers in 2’s complement representation. What that means is that the negative numbers are represented in the following form –
(2^b - N) N = the number b = number of bits
You can get 2’s complement representation of the number -X by doing the following in binary –
~X + 1 X = binary representation of the positive integer
Let’s take an example.
The 8 bit binary representation of 4 is = 0000 0100
The 8 bit binary representation of -4 is = ~(0000 0100) + 1 = 1111 1011 + 1 = 1111 1100
Let’s take another example.
The 8 bit binary representation of 6 is = 0000 0110
The 8 bit binary representation of -6 is = ~(0000 0110) + 1 = 1111 1001 + 1 = 1111 1010
Now if we AND a number with its 2’s complement representation we see something useful –
In case of 4, AND = 0000 0100
In case of 6, AND = 0000 0010
What does the above pattern say. It says that the first set bit in the actual number is also set in the result of the AND. In case of 4, the third bit was the first set bit. And it’s set in the result of the AND too. In case of 6, the second bit was the first set bit. And it’s set in the result of the AND too.
Now if you think of all the powers of two they have exactly one bit set. If we represent integers in 8 bits, then the possible powers of 2 in this range are –
0000 0001 0000 0010 0000 0100 0000 1000 0001 0000 0010 0000 0100 0000 1000 0000
Which means for the powers of two, the result of the AND operation will always give us the same number. Hence we can just check the equality between the input number and the result of the AND to detect if the number is power of 2 and it’s faster than the previous code too.
Here’s the code
I profiled both the functions by calling them on 1 billion numbers. The above code using 2’s complement representation takes the following time
real 0m5.846s user 0m5.830s sys 0m0.010s
The old iterative version on the same set of numbers takes the following time
real 2m9.093s user 2m7.756s sys 0m0.660s
And as you can see the faster version is almost 25 times faster than the iterative version.