Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hidden features of C

More of a trick of the GCC compiler, but you can give branch indication hints to the compiler (common in the Linux kernel)

#define likely(x)       __builtin_expect((x),1)
#define unlikely(x)     __builtin_expect((x),0)

see: http://kerneltrap.org/node/4705

What I like about this is that it also adds some expressiveness to some functions.

void foo(int arg)
{
     if (unlikely(arg == 0)) {
           do_this();
           return;
     }
     do_that();
     ...
}

int8_t
int16_t
int32_t
uint8_t
uint16_t
uint32_t

These are an optional item in the standard, but it must be a hidden feature, because people are constantly redefining them. One code base I've worked on (and still do, for now) has multiple redefinitions, all with different identifiers. Most of the time it's with preprocessor macros:

#define INT16 short
#define INT32  long

And so on. It makes me want to pull my hair out. Just use the freaking standard integer typedefs!


The comma operator isn't widely used. It can certainly be abused, but it can also be very useful. This use is the most common one:

for (int i=0; i<10; i++, doSomethingElse())
{
  /* whatever */
}

But you can use this operator anywhere. Observe:

int j = (printf("Assigning variable j\n"), getValueFromSomewhere());

Each statement is evaluated, but the value of the expression will be that of the last statement evaluated.


initializing structure to zero

struct mystruct a = {0};

this will zero all stucture elements.


Function pointers. You can use a table of function pointers to implement, e.g., fast indirect-threaded code interpreters (FORTH) or byte-code dispatchers, or to simulate OO-like virtual methods.

Then there are hidden gems in the standard library, such as qsort(),bsearch(), strpbrk(), strcspn() [the latter two being useful for implementing a strtok() replacement].

A misfeature of C is that signed arithmetic overflow is undefined behavior (UB). So whenever you see an expression such as x+y, both being signed ints, it might potentially overflow and cause UB.


Multi-character constants:

int x = 'ABCD';

This sets x to 0x41424344 (or 0x44434241, depending on architecture).

EDIT: This technique is not portable, especially if you serialize the int. However, it can be extremely useful to create self-documenting enums. e.g.

enum state {
    stopped = 'STOP',
    running = 'RUN!',
    waiting = 'WAIT',
};

This makes it much simpler if you're looking at a raw memory dump and need to determine the value of an enum without having to look it up.