Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strict aliasing seems inconsistant

Had a couple of bugs from strict aliasing and so thought I would try to fix all of them. Having looked in some detail at what it is it seems sometimes GCC doesn't issue a warning, and also that some things are impossible to implement. At least by my understanding every below is broken. So is my understanding wrong, is there a correct way to do all these things, or does some code just have to technically break the rule and be well covered by system tests?

The bugs were from some code where char and unsigned char buffers were mixed, e.g. like below:

size_t Process(char *buf, char *end)
{
    char *p = buf;
    ProcessSome((unsigned char**)&p, (unsigned char*)end);
    //GCC decided p could not be changed by ProcessSome and so always returned 0
    return (size_t)(p - buf);
}

Changing this to the below seemed to fix the problem, although it still involves a cast so I am not sure why this now works and is warning free:

size_t Process(char *buf, char *end)
{
    unsigned char *buf2 = (unsigned char *)buf;
    unsigned char *p = buf2;
    unsigned char *end2 = (unsigned char*)end;
    ProcessSome(&p, end2);
    return (size_t)(p - buf2);
}

Also there is a bunch of other places that seem to work without warnings

//contains a unsigned char* of data. Possibly from the network, disk, etc.
//the buffer contents itself is 8 byte aligned.
const Buffer *buffer = foo();
const uint16_t *utf16Text = (const uint16_t*)buffer->GetData();//const unsigned char*
//... read utf16Text. Does not even seem to ever be a warning


//also seems to work fine
size_t len = CalculateWorstCaseLength(...);
Buffer *buffer = new Buffer(len * 2);
uint16_t *utf16 = (uint16_t*)buffer->GetData();//unsigned char*
len = DoSomeProcessing(utf16, len, ...);
buffer->Truncate(len * 2);
send(buffer);

And some with...

struct Hash128
{
    unsigned char data[16];
};
...
size_t operator ()(const Hash128 &hash)
{
    return *(size_t*)hash.data;//warning
}

A non char case. This doesn't have a warning, and even if it is bad, how do I avoid it (Both ways seem to work)?

int *x = fromsomewhere();//aligned to 16 bytes, array of 4
__m128i xmm = _mm_load_si128((__m128*i)x);
__m128i xmm2 = *(__m128i*)x;

Looking at other API's there seems to be various cases as well that by my understanding violate the rule (have not come across a Linux/GCC specfic one, but sure there would be one somewhere).

  1. CoCreateInstance Has a void** output param requiring an explicit pointer cast. Direct3D has some like this as well.

  2. LARGE_INTEGER is a union that will likely have read/writes to different members (e.g. some code might use high/low, then some other might read the int64).

  3. I recall the CPython implementation quite happily casts a PyObject* to a bunch of other stuff that happens to have the same memory layout at the start.

  4. A lot of hash implementations I have seen will cast the input buffer to a uint32_t*, then perhaps use uint8_t to handle the 1-3 bytes at the end.

  5. Pretty much every memory allocator implementation I have seen uses char* or unsigned char*, which must then be cast to the desired type (possibly via a returned void*, but internally to the allocate at least it was a char)

like image 606
Will Avatar asked Oct 03 '22 12:10

Will


1 Answers

First, pointers to char and to unsigned char are pretty much exempted from rules regarding string aliasing; you are allowed to convert any type of pointer to a char* or an unsigned char*, and look at the pointed to object as an array of char or unsigned char. Now, with regards to your code:

size_t Process(char *buf, char *end)
{
    char *p = buf;
    ProcessSome((unsigned char**)&p, (unsigned char*)end);
    //GCC decided p could not be changed by ProcessSome and so always returned 0
    return (size_t)(p - buf);
}

The issue here is that you're trying to look at a char* as if it were an unsigned char*. That's not guaranteed. Given that the cast is clearly visible, g++ is being a bit obtuse about not turning the strict aliasing analysis off automatically, but technically, it is covered by the standard.

In

size_t Process(char *buf, char *end)
{
    unsigned char *buf2 = (unsigned char *)buf;
    unsigned char *p = buf2;
    unsigned char *end2 = (unsigned char*)end;
    ProcessSome(&p, end2);
    return (size_t)(p - buf2);
}

on the other hand, all of the conversions involve char* and unsigned char*, both of which may alias anything, so the compiler is required to make this work.

With regards to the rest, you don't say what the return type of buffer->GetData() is, so it's hard to say. But if it is char*, unsigned char* or void*, the code is fully legal (except for a missing cast in the second use of buffer->GetData()). As long as all of the casts involve a char*, an unsigned char* or a void* (ignoring const qualifiers), then the compiler is required to assume that there is a possible aliasing: when the original pointer has one of these types, it could have been created by means of a cast from a pointer to the target type, and the language guarantees that you can convert any pointer into one of these types, and back to the original type, and recover the same value. (Of course, if the char* wasn't originally a uint16_t, you may end up with alignment problems, but the compiler generally can't know this.)

With regards to the last example, you don't indicate the type of hash.data, so it's hard to say; if it is char*, void* or unsigned char*, the language guarantees your code (technically, provided that the char pointer was created by converting a size_t*; in practice, provided that the pointer is sufficiently aligned and the pointed to bytes do not form a trapping value for a size_t).

In general: the only really guaranteed way of "type punning" is by memcpy. Otherwise, the pointer casts, such as you are doing, are guaranteed as long as it is to or from a void*, char* or unsigned char*, at least as far as aliasing is concerned. (From one of these could result in alignment problems, or accessing a trapping value if you dereference it.)

Note that you may get additional guarantees from other standards. Posix requires something like:

void (*pf)();
*((void**)&pf) = ...

to work, for example. (Generally, casting and dereferencing immediately will work, even with g++, if you don't do anything else in the function where aliasing might be relevant.)

And all of the compilers I know will allow using a union for type punning, some of the time. (And at least some, including g++, will fail with legal uses of union in other cases. Correctly handling a union is tricky for the compiler writer if the union isn't visible.)

like image 184
James Kanze Avatar answered Oct 07 '22 19:10

James Kanze