Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plain C polymorphism, type punning, and strict aliasing. How legal is this?

I've been trying to work out how legal the below is and I could really use some help.

#include <stdio.h>
#include <stdlib.h>

typedef struct foo {
    int foo;
    int bar;
} foo;

void make_foo(void * p)
{
    foo * this = (foo *)p;

    this->foo = 0;
    this->bar = 1;
}

typedef struct more_foo {
    int foo;
    int bar;
    int more;
} more_foo;

void make_more_foo(void * p)
{
    make_foo(p);

    more_foo * this = (more_foo *)p;
    this->more = 2;
}

int main(void)
{
    more_foo * mf = malloc(sizeof(more_foo));

    make_more_foo(mf);
    printf("%d %d %d\n", mf->foo, mf->bar, mf->more);

    return 0;
}

As far as I've gathered, doing this is type punning and is supposed to violate the strict aliasing rule. Does it, though? The pointers passed around are void. You are allowed to interpret a void pointer any way you wish, correct?

Also, I read that there may be memory alignment issues. But struct alignment is deterministic. If the initial members are the same, then they'll get aligned the same way, and there should be no problems accessing all foo members from a more_foo pointer. Is that correct?

GCC compiles with -Wall without warnings, the program runs as expected. However, I'm not sure if it's UB or not and why.

I also saw that this:

typedef union baz {
    struct foo f;
    struct more_foo mf;
} baz;

void some_func(void)
{
    baz b;
    more_foo * mf = &b.mf; // or more_foo * mf = (more_foo *)&b;

    make_more_foo(mf);
    printf("%d %d %d\n", mf->foo, mf->bar, mf->more);
}

seems to be allowed. Because of the polymorphic nature of unions the compiler would be ok with it. Is that correct? Does that mean that by compiling with strict aliasing off you don't have to use an union and can use only structs instead?

Edit: union baz now compiles.

like image 870
Vlad Dinev Avatar asked Feb 13 '18 17:02

Vlad Dinev


People also ask

Is type punning allowed in C?

To re-iterate, type-punning through unions is perfectly fine in C (but not in C++). In contrast, using pointer casts to do so violates C99 strict aliasing and is problematic because different types may have different alignment requirements and you could raise a SIGBUS if you do it wrong.

What is strict aliasing in C?

"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"

Does c++ have strict aliasing?

In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule.

What is a type punned pointer?

A form of pointer aliasing where two pointers and refer to the same location in memory but represent that location as different types. The compiler will treat both "puns" as unrelated pointers. Type punning has the potential to cause dependency problems for any data accessed through both pointers.


1 Answers

The authors of the Standard didn't think it necessary to specify any means by which an lvalue of a struct or union's member type may be used to access the underlying struct or union. The way N1570 6.5p7 is written doesn't even allow for someStruct.member = 4; unless member if of character type. Being able to apply the & operator to struct and union members wouldn't make any sense, however, unless the authors of the Standard expected that the resulting pointers would be useful for something. Given footnote 88: "The intent of this list is to specify those circumstances in which an object may or may not be aliased", the most logical expectation is that it was only intended to apply in cases where lvalues' useful lifetimes would overlap in ways that would involve aliasing.

Consider the two functions within the code below:

struct s1 {int x;};
struct s2 {int x;};
union {struct s1 v1; struct s2 v2;} arr[10];

void test1(int i, int j)
{
  int result;
  { struct s1 *p1 = &arr[i].v1; result = p1->x; }
  if (result)
    { struct s2 *p2 = &arr[j].v2; p2->x = 2; }
  { struct s1 *p3 = &arr[i].v1; result = p3->x; }
  return result;
}

void test2(int i, int j)
{
  int result;
  struct s1 *p1 = &arr[i].v1; result = p1->x;
  if (result)
    { struct s2 *p2 = &arr[j].v2; p2->x = 2; }
  result = p1->x; }
  return result;
}

In the test1, even if i==j, all pointer that will ever be accessed during p1's lifetime will be accessed through p1, so p1 won't alias anything. Likewise with p2 and p3. Thus, since there is no aliasing, there should be no problem if i==j. In test2, however, if i==j, then the creation of p1 and the last use of it to access p1->x would be separated by another action which access that storage with a pointer not derived from p1. Consequently, if i==j, then the access via p2 would alias p1, and per N1570 5.6p7 a compiler would not be required to allow for that possibility.

If the rules of 5.6p7 are applicable even in cases that don't involve actual aliasing, then structures and unions would be pretty useless. If they only apply in cases that do involve actual aliasing, then a lot of needless complexity like the "Effective Type" rules could be done away with. Unfortunately, some compilers like gcc and clang use the rules to justify "optimizing" the first function above and then assuming that they don't have to worry about the resulting alias which is present in their "optimized" version but wasn't in the original.

Your code will work fine in any compiler whose authors make any effort to recognize derived lvalues. Both gcc and clang, however, will botch even the test1() function above unless they are invoked with the -fno-strict-aliasing flag. Given that the Standard doesn't even allow for someStruct.member = 4;, I'd suggest that you refrain from the kind of aliasing seen in test2() above and not bother targeting compilers that can't even handle test1().

like image 51
supercat Avatar answered Nov 15 '22 19:11

supercat