Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Safely punning char* to double in C

In an Open Source program I wrote, I'm reading binary data (written by another program) from a file and outputting ints, doubles, and other assorted data types. One of the challenges is that it needs to run on 32-bit and 64-bit machines of both endiannesses, which means that I end up having to do quite a bit of low-level bit-twiddling. I know a (very) little bit about type punning and strict aliasing and want to make sure I'm doing things the right way.

Basically, it's easy to convert from a char* to an int of various sizes:

int64_t snativeint64_t(const char *buf) 
{
    /* Interpret the first 8 bytes of buf as a 64-bit int */
    return *(int64_t *) buf;
}

and I have a cast of support functions to swap byte orders as needed, such as:

int64_t swappedint64_t(const int64_t wrongend)
{
    /* Change the endianness of a 64-bit integer */
    return (((wrongend & 0xff00000000000000LL) >> 56) |
            ((wrongend & 0x00ff000000000000LL) >> 40) |
            ((wrongend & 0x0000ff0000000000LL) >> 24) |
            ((wrongend & 0x000000ff00000000LL) >> 8)  |
            ((wrongend & 0x00000000ff000000LL) << 8)  |
            ((wrongend & 0x0000000000ff0000LL) << 24) |
            ((wrongend & 0x000000000000ff00LL) << 40) |
            ((wrongend & 0x00000000000000ffLL) << 56));
}

At runtime, the program detects the endianness of the machine and assigns one of the above to a function pointer:

int64_t (*slittleint64_t)(const char *);
if(littleendian) {
    slittleint64_t = snativeint64_t;
} else {
    slittleint64_t = sswappedint64_t;
}

Now, the tricky part comes when I'm trying to cast a char* to a double. I'd like to re-use the endian-swapping code like so:

union 
{
    double  d;
    int64_t i;
} int64todouble;

int64todouble.i = slittleint64_t(bufoffset);
printf("%lf", int64todouble.d);

However, some compilers could optimize away the "int64todouble.i" assignment and break the program. Is there a safer way to do this, while considering that this program must stay optimized for performance, and also that I'd prefer not to write a parallel set of transformations to cast char* to double directly? If the union method of punning is safe, should I be re-writing my functions like snativeint64_t to use it?


I ended up using Steve Jessop's answer because the conversion functions re-written to use memcpy, like so:

int64_t snativeint64_t(const char *buf) 
{
    /* Interpret the first 8 bytes of buf as a 64-bit int */
    int64_t output;
    memcpy(&output, buf, 8);
    return output;
}

compiled into the exact same assembler as my original code:

snativeint64_t:
        movq    (%rdi), %rax
        ret

Of the two, the memcpy version more explicitly expresses what I'm trying to do and should work on even the most naive compilers.

Adam, your answer was also wonderful and I learned a lot from it. Thanks for posting!

like image 400
Kirk Strauser Avatar asked Oct 21 '08 15:10

Kirk Strauser


2 Answers

I highly suggest you read Understanding Strict Aliasing. Specifically, see the sections labeled "Casting through a union". It has a number of very good examples. While the article is on a website about the Cell processor and uses PPC assembly examples, almost all of it is equally applicable to other architectures, including x86.

like image 157
Adam Rosenfield Avatar answered Oct 25 '22 22:10

Adam Rosenfield


Since you seem to know enough about your implementation to be sure that int64_t and double are the same size, and have suitable storage representations, you might hazard a memcpy. Then you don't even have to think about aliasing.

Since you're using a function pointer for a function that might easily be inlined if you were willing to release multiple binaries, performance must not be a huge issue anyway, but you might like to know that some compilers can be quite fiendish optimising memcpy - for small integer sizes a set of loads and stores can be inlined, and you might even find the variables are optimised away entirely and the compiler does the "copy" simply be reassigning the stack slots it's using for the variables, just like a union.

int64_t i = slittleint64_t(buffoffset);
double d;
memcpy(&d,&i,8); /* might emit no code if you're lucky */
printf("%lf", d);

Examine the resulting code, or just profile it. Chances are even in the worst case it will not be slow.

In general, though, doing anything too clever with byteswapping results in portability issues. There exist ABIs with middle-endian doubles, where each word is little-endian, but the big word comes first.

Normally you could consider storing your doubles using sprintf and sscanf, but for your project the file formats aren't under your control. But if your application is just shovelling IEEE doubles from an input file in one format to an output file in another format (not sure if it is, since I don't know the database formats in question, but if so), then perhaps you can forget about the fact that it's a double, since you aren't using it for arithmetic anyway. Just treat it as an opaque char[8], requiring byteswapping only if the file formats differ.

like image 39
Steve Jessop Avatar answered Oct 25 '22 22:10

Steve Jessop