Had a couple of bugs from strict aliasing and so thought I would try to fix all of them. Having looked in some detail at what it is it seems sometimes GCC doesn't issue a warning, and also that some things are impossible to implement. At least by my understanding every below is broken. So is my understanding wrong, is there a correct way to do all these things, or does some code just have to technically break the rule and be well covered by system tests?
The bugs were from some code where char and unsigned char buffers were mixed, e.g. like below:
size_t Process(char *buf, char *end)
{
char *p = buf;
ProcessSome((unsigned char**)&p, (unsigned char*)end);
//GCC decided p could not be changed by ProcessSome and so always returned 0
return (size_t)(p - buf);
}
Changing this to the below seemed to fix the problem, although it still involves a cast so I am not sure why this now works and is warning free:
size_t Process(char *buf, char *end)
{
unsigned char *buf2 = (unsigned char *)buf;
unsigned char *p = buf2;
unsigned char *end2 = (unsigned char*)end;
ProcessSome(&p, end2);
return (size_t)(p - buf2);
}
Also there is a bunch of other places that seem to work without warnings
//contains a unsigned char* of data. Possibly from the network, disk, etc.
//the buffer contents itself is 8 byte aligned.
const Buffer *buffer = foo();
const uint16_t *utf16Text = (const uint16_t*)buffer->GetData();//const unsigned char*
//... read utf16Text. Does not even seem to ever be a warning
//also seems to work fine
size_t len = CalculateWorstCaseLength(...);
Buffer *buffer = new Buffer(len * 2);
uint16_t *utf16 = (uint16_t*)buffer->GetData();//unsigned char*
len = DoSomeProcessing(utf16, len, ...);
buffer->Truncate(len * 2);
send(buffer);
And some with...
struct Hash128
{
unsigned char data[16];
};
...
size_t operator ()(const Hash128 &hash)
{
return *(size_t*)hash.data;//warning
}
A non char case. This doesn't have a warning, and even if it is bad, how do I avoid it (Both ways seem to work)?
int *x = fromsomewhere();//aligned to 16 bytes, array of 4
__m128i xmm = _mm_load_si128((__m128*i)x);
__m128i xmm2 = *(__m128i*)x;
Looking at other API's there seems to be various cases as well that by my understanding violate the rule (have not come across a Linux/GCC specfic one, but sure there would be one somewhere).
CoCreateInstance Has a void** output param requiring an explicit pointer cast. Direct3D has some like this as well.
LARGE_INTEGER is a union that will likely have read/writes to different members (e.g. some code might use high/low, then some other might read the int64).
I recall the CPython implementation quite happily casts a PyObject* to a bunch of other stuff that happens to have the same memory layout at the start.
A lot of hash implementations I have seen will cast the input buffer to a uint32_t*, then perhaps use uint8_t to handle the 1-3 bytes at the end.
Pretty much every memory allocator implementation I have seen uses char* or unsigned char*, which must then be cast to the desired type (possibly via a returned void*, but internally to the allocate at least it was a char)
First, pointers to char
and to unsigned char
are pretty much
exempted from rules regarding string aliasing; you are allowed
to convert any type of pointer to a char*
or an unsigned
char*
, and look at the pointed to object as an array of char
or unsigned char
. Now, with regards to your code:
size_t Process(char *buf, char *end)
{
char *p = buf;
ProcessSome((unsigned char**)&p, (unsigned char*)end);
//GCC decided p could not be changed by ProcessSome and so always returned 0
return (size_t)(p - buf);
}
The issue here is that you're trying to look at a char*
as if
it were an unsigned char*
. That's not guaranteed. Given
that the cast is clearly visible, g++ is being a bit obtuse
about not turning the strict aliasing analysis off
automatically, but technically, it is covered by the standard.
In
size_t Process(char *buf, char *end)
{
unsigned char *buf2 = (unsigned char *)buf;
unsigned char *p = buf2;
unsigned char *end2 = (unsigned char*)end;
ProcessSome(&p, end2);
return (size_t)(p - buf2);
}
on the other hand, all of the conversions involve char*
and
unsigned char*
, both of which may alias anything, so the
compiler is required to make this work.
With regards to the rest, you don't say what the return type of
buffer->GetData()
is, so it's hard to say. But if it is
char*
, unsigned char*
or void*
, the code is fully legal
(except for a missing cast in the second use of
buffer->GetData()
). As long as all of the casts involve
a char*
, an unsigned char*
or a void*
(ignoring const
qualifiers), then the compiler is required to assume that there
is a possible aliasing: when the original pointer has one of
these types, it could have been created by means of a cast from
a pointer to the target type, and the language guarantees that
you can convert any pointer into one of these types, and back to
the original type, and recover the same value. (Of course, if
the char*
wasn't originally a uint16_t
, you may end up with
alignment problems, but the compiler generally can't know this.)
With regards to the last example, you don't indicate the type of
hash.data
, so it's hard to say; if it is char*
, void*
or
unsigned char*
, the language guarantees your code
(technically, provided that the char pointer was created by
converting a size_t*
; in practice, provided that the
pointer is sufficiently aligned and the pointed to bytes do not
form a trapping value for a size_t
).
In general: the only really guaranteed way of "type punning" is
by memcpy
. Otherwise, the pointer casts, such as you are
doing, are guaranteed as long as it is to or from a void*
,
char*
or unsigned char*
, at least as far as aliasing is
concerned. (From one of these could result in alignment
problems, or accessing a trapping value if you dereference it.)
Note that you may get additional guarantees from other standards. Posix requires something like:
void (*pf)();
*((void**)&pf) = ...
to work, for example. (Generally, casting and dereferencing immediately will work, even with g++, if you don't do anything else in the function where aliasing might be relevant.)
And all of the compilers I know will allow using a union
for
type punning, some of the time. (And at least some, including
g++, will fail with legal uses of union
in other cases.
Correctly handling a union
is tricky for the compiler writer
if the union
isn't visible.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With