How do I optimize this Piece of code in C

Question

Yesterday,in an Interview I have been asked to test the 5th bit in a number(to test whether its on and off)and below is how I done it.

int number = 16;
int mask   = 1<<5;

if ((number & mask) == 0)
    printf("Bit is off");
else
    printf("its on");

but he then asked me to optimize this code and do it without using this particular mask.

So my questions what else mask I could have used in this code?

gbulmer · Accepted Answer

Maybe the interviewer wanted to see how you reacted to a simple challenge. Or simply wanted to know if you really understood C, and would stand your ground? Maybe the interviewer wanted to know if you knew non-zero is true, and hence test your depth of understanding of C? Or maybe whether you could do binary to hex in your head?

IMHO interviews are about a lot more than code. They are expensive to do. I always tried to get a clear impression of the person, something hard to do by written communication, or even on the phone. After all, some of those folks are going to work with the recruit!

The most compact, and possibly the quickest is probably:

int number = 16;  // is this really what they gave?

printf((number & 0x20)?"its on":"Bit is off"); // did they mean 5th or bit 5?

Edit:

I've coded up the original approach, and my alternative, and compiled it for ARM Coretx-M3/4 (this is the processor I am writing for at the moment). It was compiled with -O3. Then I have disassembled each compiled file (using objdump) to get the assembler. I did it this way because the output of gcc -S was huge; that includes a lot of extra information for the assembler and linker, which made it harder to find the code.

I replaced printf with a dummy_printf to avoid #including stdio.h which added more noise. The dummy_printf isn't exactly the same as printf, it just takes one parameter, but it keeps the output short :-)

The source (all 7 files appended to make it easier to read) are at: http://pastebin.com/PTeApu9n

The resulting concatenated output of objdump (for each .o) is at: http://pastebin.com/kHAmakE3

As you can see, the original is compiled to:

void original_bit5(int number) {
    int mask = 1<<5;

    if ((number & mask) == 0)
   0:   f010 0f20   tst.w   r0, #32
   4:   d005        beq.n   1a <original_bit5+0x1a>
        dummy_printf("Bit is off");
    else
        dummy_printf("its on"); 
   6:   f240 0000   movw    r0, #0
   a:   f2c0 0000   movt    r0, #0
   e:   f7ff bffe   b.w 0 <dummy_printf>

void original_bit5(int number) {
    int mask = 1<<5;

    if ((number & mask) == 0)
        dummy_printf("Bit is off");
  12:   f240 0000   movw    r0, #0
  16:   f2c0 0000   movt    r0, #0
  1a:   f7ff bffe   b.w 0 <dummy_printf>
  1e:   bf00        nop

I think the call to dummy_printf is using tail-call chaining, i.e. dummy_printf will not return to this function. Very efficient!

There is no function entry code because the first four function parameters are passed in registers r0-r3.

You can't see the addresses of the two strings being loaded in r0. That is because this hasn't been linked.

You can see that:

int mask = 1<<5;    
if ((number & mask) == 0)

is compiled to:

   0:   f010 0f20   tst.w   r0, #32
   4:   d005        beq.n   1a <original_bit5+0x1a>

So 1<<5 and (... == 0), are compiler to a more direct and efficient sequence of instructions. There is a branch to the appropriate call of dummy_printf.

My code compiles to:

void my_bit5(int number) {
    dummy_printf((number & 0x20)?"its on":"Bit is off");    
   0:   f240 0200   movw    r2, #0
   4:   f240 0300   movw    r3, #0
   8:   f010 0f20   tst.w   r0, #32
   c:   f2c0 0200   movt    r2, #0
  10:   f2c0 0300   movt    r3, #0
  14:   bf14        ite ne
  16:   4610        movne   r0, r2
  18:   4618        moveq   r0, r3
  1a:   f7ff bffe   b.w 0 <dummy_printf>
  1e:   bf00        nop

This also seems to get tail-call optimised, i.e. there is no return from this function because there is no need of one, the return by dummy_printf will return directly to main()

What you can't see is the two registers, r2 and r2 will contain the addresses of the two strings. That is because this hasn't been linked.

As you can see there is a conditional execution instruction 'ite' which loads the parameter register r0 with either register r2 or register r3. So there is no branch in this code.

For a simple processor with pipelining, this can be quite efficient. On a simple pipelined processor, a branch can cause a 'pipeline 'stall' while parts of the pipeline are cleared out. This varies from processor to processor. So I assume gcc has got it right, and generated a better code sequence than executing a branch. I haven't checked.

Spurred on by Lundin, I have come up with this:

void union_bit5(int number) {
    union { int n; struct { unsigned :5; unsigned bit :1; }; } tester;
    tester.n = number;

    if (tester.bit)
        dummy_printf("Bit is on");
    else
        dummy_printf("its off");    
}

It does not explicitly include a mask, or bit shifting. It is almost certainly compiler dependent, you'd have to test it to ensure it works (glerk !-(

gcc for ARM generates the same code (bne vs beq, but that can be adjusted) as the OP's solution, so no optimisation, but it removes the mask:

void union_bit5(int number) {
    union { int n; struct { unsigned :5; unsigned bit :1; }; } tester;
    tester.n = number;

    if (tester.bit)
   0:   f010 0f20   tst.w   r0, #32
   4:   d105        bne.n   1a <union_bit5+0x1a>
        dummy_printf("Bit is on");
    else
        dummy_printf("its off");    
   6:   f240 0000   movw    r0, #0
   a:   f2c0 0000   movt    r0, #0
   e:   f7ff bffe   b.w 0 <dummy_printf>
void union_bit5(int number) {
    union { int n; struct { unsigned :5; unsigned bit :1; }; } tester;
    tester.n = number;

    if (tester.bit)
        dummy_printf("Bit is on");
  12:   f240 0000   movw    r0, #0
  16:   f2c0 0000   movt    r0, #0
  1a:   f7ff bffe   b.w 0 <dummy_printf>
  1e:   bf00        nop

For what it's worth:

(number & 0x20)? dummy_printf("its on") : dummy_printf("Bit is off");

gcc for ARM generates exactly the same code as the OP's. It generates branch, and not conditional instructions.

Summary:

The original code is compiled to a very efficient sequence of instructions
The ternary ...?...:... operator can compile to code which does not involve branches on the ARM Cortex-M3/4, but may also generate normal branch instructions.
It is difficult to write more efficient code than the original in this case :-)

I'll add, IMHO the cost of doing a printf is so enormous compared to a bit test, that worrying about optimising a bit test is too small an issue; it fails Amdahl's Law. An appropriate tactic for the bit test is to ensure -O3 or -Os is used.

If you wanted to to do something somewhat perverse (especially for such a trivial problem), but different, which might make the interviewer think, create a lookup table for every byte value. (I don't expect it to be faster ...)

#define BIT5(val) (((val)&0x20)?1:0)
const unsigned char bit5[256] = {
BIT5(0x00),BIT5(0x01), BIT5(0x02),BIT5(0x03), 
BIT5(0x04),BIT5(0x05), BIT5(0x06),BIT5(0x07),
// ... you get the idea ...
BIT5(0xF8),BIT5(0xF9), BIT5(0xFA),BIT5(0xFB), 
BIT5(0xFC),BIT5(0xFD), BIT5(0xFE),BIT5(0xFF)
};

//...
if (bit5[(unsigned char)number]) {
    printf("its on");
} else {
    printf("Bit is off");
}

This is a handy technique if there are some complex bit patterns in, for example, a peripheral register, which need converting to a decision, or switch. It is O(1) too

You could combine the two!-)

k06a · Answer

There are two ways to check bit:

if (number & (1 << bit)) { ... }
if ((number >> bit) & 1) { ... }

I think it will be interesting for you: http://graphics.stanford.edu/~seander/bithacks.html

anurag-jain · Answer

One more way is

1: Right shift the number 5 times so that 5th bit becomes 0th from the right(i.e LSB).
2: Now the logic is the numbers with LSB as 1 are odd, and the ones with 0 are even, Check that using %2

If you think the mod operations are much costlier than bit operation, I think it all depends on the compiler. Have a look at this thread

AND faster than integer modulo operation?.

I'm not sure why the interviewer would have asked you to optimize, may be he was expecting the modulus method as answer.

How do I optimize this Piece of code in C

Tags:

c

Amit Singh Tomar

Video Answer

3 Answers

gbulmer

k06a

anurag-jain

Recent Activity

Donate For Us

How do I optimize this Piece of code in C

Tags:

c

Amit Singh Tomar

Video Answer

3 Answers

gbulmer

k06a

anurag-jain

Related questions

Recent Activity

Donate For Us