Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the semantics of overlapping objects in C?

Consider the following struct:

struct s {
  int a, b;
};

Typically1, this struct will have size 8 and alignment 4.

What if we create two struct s objects (more precisely, we write into allocated storage two such objects), with the second object overlapping the first?

char *storage = malloc(3 * sizeof(struct s));
struct s *o1 = (struct s *)storage; // offset 0
struct s *o2 = (struct s *)(storage + alignof(struct s)); // offset 4

// now, o2 points half way into o1
*o1 = (struct s){1, 2};
*o2 = (struct s){3, 4};

printf("o2.a=%d\n", o2->a);
printf("o2.b=%d\n", o2->b);
printf("o1.a=%d\n", o1->a);
printf("o1.b=%d\n", o1->b);

Is anything about this program undefined behavior? If so, where does it become undefined? If it is not UB, is it guaranteed to always print the following:

o2.a=3
o2.b=4
o1.a=1
o1.b=3

In particular, I want to know what happens to the object pointed to by o1 when o2, which overlaps it, is written. Is it still allowed to access the unclobbered part (o1->a)? Is accessing the clobbered part o1->b simply the same as accessing o2->a?

How does effective type apply here? The rules are clear enough when you are talking about non-overlapping objects and pointers that point to the same location as the last store, but when you start talking about the effective type of portions of objects or overlapping objects it is less clear.

Would anything change if the the second write was of a different type? If the members were say int and short rather than two ints?

Here's a godbolt if you want to play with it there.


1 This answer applies to platforms where this isn't the case too: e.g., some might have size 4 and alignment 2. On a platform where the size and alignment were the same, this question wouldn't apply since aligned, overlapping objects would be impossible, but I'm not sure if there is any platform like that.

like image 455
BeeOnRope Avatar asked Apr 07 '20 00:04

BeeOnRope


1 Answers

Basically this is all grey area in the standard; the strict aliasing rule specifies basic cases and leaves the reader (and compiler vendors) to fill in the details.

There have been efforts to write a better rule but so far they haven't resulted in any normative text and I'm not sure what the status of this is for C2x.

As mentioned in my answer to your previous question, the most common interpretation is that p->q means (*p).q and the effective type applies to all of *p, even though we then go on to apply .q .

Under this interpretation, printf("o1.a=%d\n", o1->a); would cause undefined behaviour as the effective type of the location *o1 is not s (since part of it has been overwritten).

The rationale for this interpretation can be seen in a function like:

void f(s* s1, s* s2)
{
    s2->a = 5;
    s1->b = 6;
    printf("%d\n", s2->a);
}

With this interpretation the last line could be optimised to puts("5"); , but without it, the compiler would have to consider that the function call may have been f(o1, o2); and therefore lose all benefits that are purportedly provided by the strict aliasing rule.

A similar argument applies to two unrelated struct types that both happen to have an int member at different offset.

like image 164
M.M Avatar answered Nov 14 '22 23:11

M.M