I'm curious about conventions for type-punning pointers/arrays in C++. Here's the use case I have at the moment:
Compute a simple 32-bit checksum over a binary blob of data by treating it as an array of 32-bit integers (we know its total length is a multiple of 4), and then summing up all values and ignoring overflow.
I would expect such an function to look like this:
uint32_t compute_checksum(const char *data, size_t size) { const uint32_t *udata = /* ??? */; uint32_t checksum = 0; for (size_t i = 0; i != size / 4; ++i) checksum += udata[i]; return udata; }
Now the question I have is, what do you consider the "best" way to convert data
to udata
?
C-style cast?
udata = (const uint32_t *)data
C++ cast that assumes all pointers are convertible?
udata = reinterpret_cast<const uint32_t *>(data)
C++ cast that between arbitrary pointer types using intermediate void*
?
udata = static_cast<const uint32_t *>(static_cast<const void *>(data))
Cast through a union?
union { const uint32_t *udata; const char *cdata; }; cdata = data; // now use udata
I fully realize that this will not be a 100% portable solution, but I am only expecting to use it on a small set of platforms where I know it works (namely unaligned memory accesses and compiler assumptions on pointer aliasing). What would you recommend?
Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type.
"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"
Type punning. A form of pointer aliasing where two pointers and refer to the same location in memory but represent that location as different types. The compiler will treat both "puns" as unrelated pointers. Type punning has the potential to cause dependency problems for any data accessed through both pointers.
As far as the C++ standard is concerned, litb's answer is completely correct and the most portable. Casting const char *data
to a const uint3_t *
, whether it be via a C-style cast, static_cast
, or reinterpret_cast
, breaks the strict aliasing rules (see Understanding Strict Aliasing). If you compile with full optimization, there's a good chance that the code will not do the right thing.
Casting through a union (such as litb's my_reint
) is probably the best solution, although it does technically violate the rule that if you write to a union through one member and read it through another, it results in undefined behavior. However, practically all compilers support this, and it results in the the expected result. If you absolutely desire to conform to the standard 100%, go with the bit-shifting method. Otherwise, I'd recommend going with casting through a union, which is likely to give you better performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With