Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bitwise operation in C to compare two integers

I was recently given a quiz in one of my classes. The question is below:

Write a function (called cmp) in C that accepts two integers (x and y) and returns: -1 if x < y, 0 if x = y, 1 if x > y. Write cmp as concise as possible.

The most concise function that I could think of was:

int cmp(int x, int y) {
    return ((x < y) ? (-1) : ((x == y) ? (0) : (1)));
}

But I have a feeling there could be a bit manipulation that I could use to do this more concisely. Perhaps a combination of & and ^ ? This has been bothering me for the last few days and I wanted to know if there in fact IS a better way to do this?

like image 597
an4s Avatar asked Sep 09 '17 23:09

an4s


2 Answers

“as concise as possible” is an extremely vague requirement for a quiz. Are you expected to do code golf? Does removing whitespace and parentheses make it more concise? Anyway, here’s one solution using arithmetic on the results of comparisons:

int cmp(int x, int y) {
    return (x > y) - (x < y);
}
like image 74
2 revs Avatar answered Oct 20 '22 12:10

2 revs


  • If x == y then x - y == 0.
  • If x < y then x - y < 0
  • If x > y then x - y > 0

So we want to see if we can convert the 3 conditions described in the 3 bullet-points above into 3 single output values needed for your cmp function:

int cmp( int x, int y ) {
    return -1 if x < y
    return  0 if x == y
    return  1 if x > y
}

This can be redefined as:

int cmp( int x, int y ) return singleValue( x - y );

int singleValue( int diff ) {
    return -1 if diff < 0
    return  0 if diff == 0
    return  1 if diff > 0
}

Now consider (and assuming) the computer uses two's complement for 32-bit signed integers, aka int) then all negative values will have the most-significant bit (MSB, the 0th bit) set to 1.

For 32-bit integers, this means that the following expression is true for all negative numbers:

( anyNegativeNumber & 0x8000000 ) == 0x8000000

The inverse is true: all positive non-zero integers will have an MSB of 0. Finally, all zero values (int zero == 0) have all their bits set to 0.

( anyPositiveNumber & 0x8000000 ) == 0

If we look at the MSB (first-bit) in addition to checking if any other bits are 1, with the desired output from our singleValue function described above:

value | first bit | any other bits | desired output
    0 |         0 |              0 |           0b ( 0)
  122 |         0 |              1 |           1b ( 1)
 -128 |         1 |              0 | 11111...111b (-1)
 -123 |         1 |              1 | 11111...111b (-1)

We can create 0 and 1 directly from the input value by masking the bits, but -1 is a special case but we can handle that, like so:

int diff = x - y; // so only the 1st and last bits are set

If the 1st bit of the diff is set, then return -1. If diff value is 0 then return 0 Else return 1

return ( diff & 0x80000000 == 0x80000000 ) ? 0xFFFFFFFF : ( diff != 0 );

This can be compacted:

int diff;
return ( ( ( diff = x - y ) & 0x80000000 ) == 0x80000000 ) ? 0xFFFFFFFF : ( diff != 0 );

This is still using the == and != operators though... which can be eliminated by taking advantage of the fact a single-bit (nth place) value can be shifted n-bits right to convert it to a boolean value:

( diff = x - y ) >> 31 // evaluates to 1 if x < y, 0 if x == y or x > y

The diff != 0 bit can be eliminated by taking advantage of the fact that !!a is 1 for all non-zero values and 0 for zero values:

!diff  // evaluates to 1 if diff == 0, else 0
!!diff // evaluates to 0 if diff == 0, else 1

We can combine the two:

int diff;
return ( ( diff = x - y ) >> 31 ) ? -1 : !!diff;

This operation has a branch in it (?:) and a temporary variable (diff) but there is a branch-less version of the same function.

It can be seen that the three possible outputs are:

0xFFFFFFFF == 1111_1111 1111_1111 1111_1111 1111_1111 b
0x00000000 == 0000_0000 0000_0000 0000_0000 0000_0000 b
0x00000001 == 0000_0000 0000_0000 0000_0000 0000_0001 b

The >> operator has "sign extension" for signed values, which means:

1000 b >> 2 == 1110 b
0100 b >> 2 == 0001 b

So diff >> 31 will be 1111..1111 if the 0th bit is 1, otherwise is 0000..0000.

Each individual bit's value can be expressed as a function of diff:

a = ( diff >> 31 ) // due to sign-extension, `a` will only ever be either 1111..1111 or 0000..0000
b = !!diff         // `b` will only ever 1 or 0
c = a | b          // bitwise OR means that `1111..1111 | 0` is still `1111..1111` but `0000..0000 | 1` will be `0000..0001`.

Or just:

c = ( diff >> 31 ) | ( !!diff );

Substituting this into the expression above:

int diff = x - y;
return ( diff >> 31 ) | !!diff;

Or

int diff;
return diff = x - y, ( diff >> 31 ) | !!diff;

The comma operator must be used because C does not specify nor guarantee the evaluation order of binary operator operand expressions, but the evaluation order of the comma operator is.

As this is an inline function, and assuming we're okay with mutable arguments, then we can eliminate diff because we only use x or y once:

return x = x - y, ( x >> 31 ) | !!x;

Here's my test program and the output I get, using GCC:

#include <stdio.h>

int cmp(int x, int y) {

    return x = x - y, ( x >> 31 ) | !!x;
}

int main() {

    printf( "cmp( 1, 2 ) == %d\n", cmp( 1,2 ) );
    printf( "cmp( 2, 2 ) == %d\n", cmp( 2,2 ) );
    printf( "cmp( 2, 1 ) == %d\n", cmp( 2,1 ) );
}

Output:

cmp( 1, 2 ) == -1
cmp( 2, 2 ) == 0
cmp( 2, 1 ) == 1

Now this is not perfect because of problems of integer overflow if x and y are both large numbers and x is negative, e.g. (-4000000000) - (4000000000). Checking for this condition is possible but defeats the point of making the code as succinct as possible - you would also need to add code that handles the error condition too. In this situation, a better way would be to simply check the user-provided inputs instead of checking the function argument values.

TL;DR:

int cmp(int x, int y) {

    return x = x - y, ( x >> 31 ) | !!x;
}
like image 32
Dai Avatar answered Oct 20 '22 13:10

Dai