Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I reinterpret data through a different type? (type punning confusion)

#include <iostream>

int main(int argc, char * argv[])
{
    int a = 0x3f800000;

    std::cout << a << std::endl;

    static_assert(sizeof(float) == sizeof(int), "Oops");

    float f2 = *reinterpret_cast<float *>(&a);

    std::cout << f2 << std::endl;

    void * p = &a;
    float * pf = static_cast<float *>(p);
    float f3 = *pf;

    std::cout << f3 << std::endl;

    float f4 = *static_cast<float *>(static_cast<void *>(&a));

    std::cout << f4 << std::endl;
}

I get the following info out of my trusty compiler:

me@Mint-VM ~/projects $ g++-5.3.0 -std=c++11 -o pun pun.cpp -fstrict-aliasing -Wall
pun.cpp: In function ‘int main(int, char**)’:
pun.cpp:11:45: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     float f2 = *reinterpret_cast<float *>(&a);
                                             ^
pun.cpp:21:61: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     float f4 = *static_cast<float *>(static_cast<void *>(&a));
                                                             ^
me@Mint-VM ~/projects $ ./pun
1065353216
1
1
1
me@Mint-VM ~/projects $ g++-5.3.0 --version
g++-5.3.0 (GCC) 5.3.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I don't really understand when and why I get type-punned errors in some places and not in others.

So, strict aliasing:

Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)

Line 11 claims I'm breaking strict-aliasing. I don't see a case where this can possibly hurt anything - the pointer "comes into existence", is immediately dereferenced, then thrown away. In all likelyhood, this will compile down to zero instructions. This seems like absolutely no risk - I'm telling the compiler EXACTLY what I want.

Lines 15-16 proceed to NOT elicit a warning, even though the pointers to the same memory location are now here to stay. This appears to be a bug in gcc.

Line 21 elicits the warning, showing that this is NOT limited to just reinterpret_cast.

Unions are no better (emphasis mine):

...it's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.

This link talks about using memcpy, but that seems to just hide what you're really trying to accomplish.

For some systems, it is a required operation to write a pointer to an int register, or receive an incoming byte stream and assemble those bytes into a float, or other non-integral type.

What is the correct, standard-conforming way of doing this?

like image 991
bizaff Avatar asked Apr 04 '16 21:04

bizaff


3 Answers

Credit to Anton here please. His answer was first and his is correct.

I am posting this exposition because I know you won't believe him until you see the assembler:

Given:

#include <cstring>
#include <iostream>

// prevent the optimiser from eliding this function altogether
__attribute__((noinline))
float convert(int in)
{
    static_assert(sizeof(float) == sizeof(int), "Oops");
    float result;
    memcpy(&result, &in, sizeof(result));
    return result;
}

int main(int argc, char * argv[])
{
    int a = 0x3f800000;
    float f = convert(a);


    std::cout << a << std::endl;
    std::cout << f << std::endl;
}

result:

1065353216
1

compiled with -O2, here's the assembler output for the function convert, with some added comments for clarity:

#
# I'll give you £10 for every call to `memcpy` you can find...
#
__Z7converti:                           ## @_Z7converti
    .cfi_startproc
## BB#0:
    pushq   %rbp
Ltmp0:
    .cfi_def_cfa_offset 16
Ltmp1:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp2:
    .cfi_def_cfa_register %rbp
#
# here's the conversion - simply move the integer argument (edi)
# into the first float return register (xmm0)
#
    movd    %edi, %xmm0
    popq    %rbp
    retq
    .cfi_endproc
#
# did you see any memcpy's? 
# nope, didn't think so.
#

Just to drive the point home, here's the same function compiled with -O2 and -fomit-frame-pointer :

__Z7converti:                           ## @_Z7converti
    .cfi_startproc
## BB#0:
    movd    %edi, %xmm0
    retq
    .cfi_endproc

Remember, this function only exists because I added the attribute to prevent the compiler from inlining it. In reality, with optimisations enabled, the entire function will be optimised away. Those 3 lines of code in the function and the call at the call site will vanish.

Modern optimising compilers are awesome.

but what I really wanted was this std::cout << *reinterpret_cast<float *>(&a) << std::endl; and I think it expresses my intent perfectly well.

Well, yes it does. But c++ is designed with both correctness and performance in mind. Very often, the compiler would like to assume that two pointers or two references don't point to the same piece of memory. If it can do that, it can make all kinds of clever optimisations (usually involving not bothering make reads or writes which aren't necessary to produce the required effect). However, because a write to one pointer could affect the read from the other (if they really point at the same object), then in the interests of correctness, the compiler may not assume that the two objects are distinct, and it must perform every read and write you indicated in your code - just in case one write affects a subsequent read... unless the pointers point to different types. If they point to different types, the compiler is allowed to assume that they will never point to the same memory - this is the strict aliasing rule.

When you do this: *reinterpret_cast<float *>(&a),

you're trying to read the same memory via an int pointer and a float pointer. Because the pointers are of different types, the compiler will assume that they point to different memory addresses - even though in your mind they do not.

This is the struct aliasing rule. It's there to help programs perform quickly and correctly. A reinterpret cast like this prevents either.

like image 126
Richard Hodges Avatar answered Sep 25 '22 21:09

Richard Hodges


Use memcpy:

memcpy(&f2, &a, sizeof(float));

If you are worried about type safety and semantics, you can easily write a wrapper:

void convert(float& x, int a) {
    memcpy(&x, &a, sizeof(float));
}

And if you want, you can make this wrapper template to satisfy your needs.

like image 35
Anton Savin Avatar answered Sep 21 '22 21:09

Anton Savin


As you've found out, reinterpret_cast can't be used for type punning

  • Is reinterpret_cast type punning actually undefined behavior?
  • Is reinterpret_cast mostly useless?

Since C++20 you'll have a safe way for type punning via std::bit_cast

uint32_t bit_pattern = 0x3f800000U;
constexpr auto f = std::bit_cast<float>(bit_pattern);

At the moment std::bit_cast is only supported by MSVC

While waiting for others to implement that, if you're using Clang you can try __builtin_bit_cast. Just cast it like this

float f = __builtin_bit_cast(float, bit_pattern);

See demo on Godbolt


In other compilers or older C++ standards the only way you can do is via a memcpy

However many compilers have implementation-specific way to do type punning, or implementation-specific behavior regarding type punning. For example in GCC you can use __attribute__((__may_alias__))

union Float
{
    float __attribute__((__may_alias__)) f;
    uint32_t __attribute__((__may_alias__)) u;
};

uint32_t getFloatBits(float v)
{
    Float F;
    F.f = v;
    return F.u;
}

ICC and Clang also support that attribute. See demo

like image 41
phuclv Avatar answered Sep 23 '22 21:09

phuclv