Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clarification on an example of unions in C11 standard

Tags:

c

unions

c11

The following example is given in the C11 standard, 6.5.2.3

The following is not a valid fragment (because the union type is not visible within function f):

struct t1 { int m; };
struct t2 { int m; };
int f(struct t1 *p1, struct t2 *p2)
{
   if (p1->m < 0)
   p2->m = -p2->m;
   return p1->m;
}
int g()
{
   union {
      struct t1 s1;
      struct t2 s2;
   } u;
   /* ... */
   return f(&u.s1, &u.s2);
}

Why does it matter that the union type is visible to the function f?

In reading through the relevant section a couple times, I could not see anything in the containing section disallowing this.

like image 350
Kyle Avatar asked Mar 24 '16 02:03

Kyle


People also ask

What is union in C++ with example?

A union is a user-defined type in which all members share the same memory location. This definition means that at any given time, a union can contain no more than one object from its list of members.

What is the point of unions in C?

A union is a special data type available in C that allows to store different data types in the same memory location. You can define a union with many members, but only one member can contain a value at any given time. Unions provide an efficient way of using the same memory location for multiple-purpose.

How do you declare a union in C++?

Purpose of Unions in C/ C++ Union is a user-defined datatype. All the members of union share same memory location. Size of union is decided by the size of largest member of union. If you want to use same memory location for two or more members, union is the best for that.

What is a union class?

A union is a special class type that can hold only one of its non-static data members at a time. The class specifier for a union declaration is similar to class or struct declaration: union attr class-head-name { member-specification }


3 Answers

It matters because of 6.5.2.3 paragraph 6 (emphasis added):

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

It's not an error that requires a diagnostic (a syntax error or constraint violation), but the behavior is undefined because the m members of the struct t1 and struct t2 objects occupy the same storage, but because struct t1 and struct t2 are different types the compiler is permitted to assume that they don't -- specifically that changes to p1->m won't affect the value of p2->m. The compiler could, for example, save the value of p1->m in a register on first access, and then not reload it from memory on the second access.

like image 52
Keith Thompson Avatar answered Sep 26 '22 14:09

Keith Thompson


Note: This answer doesn't directly answer your question but I think it is relevant and is too big to go in comments.


I think the example in the code is actually correct. It's true that the union common initial sequence rule doesn't apply; but nor is there any other rule which would make this code incorrect.

The purpose of the common initial sequence rule is to guarantee the same layout of the structs. However that is not even an issue here, as the structs only contain a single int, and structs are not permitted to have initial padding.

Note that , as discussed here, sections in ISO/IEC documents titled Note or Example are "non-normative" which means they do not actually form a part of the specification.


It has been suggested that this code violates the strict aliasing rule. Here is the rule, from C11 6.5/7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object, [...]

In the example, the object being accessed (denoted by p2->m or p1->m) have type int. The lvalue expressions p1->m and p2->m have type int. Since int is compatible with int, there is no violation.

It's true that p2->m means (*p2).m, however this expression does not access *p2. It only accesses the m.


Either of the following would be undefined:

*p1 = *(struct t1 *)p2;   // strict aliasing: struct t2 not compatible with struct t1
p2->m = p1->m++;          // object modified twice without sequence point
like image 33
M.M Avatar answered Sep 25 '22 14:09

M.M


Given the declarations:

union U { int x; } u,*up = &u;
struct S { int x; } s,*sp = &s;

the lvalues u.x, up->x, s.x, and sp->x are all of type int, but any access to any of those lvalues will (at least with the pointers initialized as shown) will also access the stored value of an object of type union U or struct S. Since N1570 6.5p7 only allows objects of those types to be accessed via lvalues whose types are either character types, or other structs or unions that contain objects of type union U and struct S, it would not impose any requirements about the behavior of code that attempts to use any of those lvalues.

I think it's clear that the authors of the Standard intended that compilers allow objects of struct or union types to be accessed using lvalues of member type in at least some circumstances, but not necessarily that they allow arbitrary lvalues of member type to access objects of struct or union types. There is nothing normative to differentiate the circumstances where such accesses should be allowed or disallowed, but there is a footnote to suggest that the purpose of the rule is to indicate when things may or may not alias.

If one interprets the rule as only applying in cases where lvalues are used in ways that alias seemingly-unrelated lvalues of other types, such an interpretation would define the behavior of code like:

struct s1 {int x; float y;};
struct s2 {int x; double y;};
union s1s2 { struct s1 v1; struct s2 v2; };

int get_x(void *p) { return ((struct s1*)p)->x; }

when the latter was passed a struct s1*, struct s2*, or union s1s2* that identifies an object of its type, or the freshly-derived address of either member of union s1s2. In any context where an implementation would see enough to have reason to care about whether operations on the original and derived lvalues would affect each other, it would be able to see the relationship between them.

Note, however, that that such an implementation would not be required to allow for the possibility of aliasing in code like the following:

struct position {double px,py,pz;};
struct velocity {double vx,vy,vz;};

void update_vectors(struct position *pos, struct velocity *vel, int n)
{
  for (int i=0; i<n; i++)
  {
    pos[i].px += vel[i].vx;
    pos[i].py += vel[i].vy;
    pos[i].pz += vel[i].vz;
  }
}

even though the Common Initial Sequence guarantee would seem to allow for that.

There are many differences between the two examples, and thus many indications that a compiler could use to allow for the realistic possibility of the first code is passed a struct s2*, it might accessing a struct s2, without having to allow for the more dubious possibility that operations upon pos[] in the second examine might affect elements of vel[].

Many implementations seeking to usefully support the Common Initial Sequence rule in useful fashion would be able to handle the first even if no union type were declared, and I don't know that the authors of the Standard intended that merely adding a union type declaration should force compilers to allow for the possibility of arbitrary aliasing among common initial sequences of members therein. The most natural intention I can see for mentioning union types would be that compilers which are unable to perceive any of the numerous clues present in the first example could use the presence or absence of any complete union type declaration featuring two types as an indication of whether lvalues of one such type might be used to access another.

Note neither N1570 P6.5p7 nor its predecessors make any effort to describe all cases where quality implementations should behave predictably when given code that uses aggregates. Most such cases are left as Quality of Implementation issues. Since low-quality-but-conforming implementations are allowed to behave nonsensically for almost any reason they see fit, there was no perceived need to complicate the Standard with cases that anyone making a bona fide effort to write a quality implementation would handle whether or not it was required for conformance.

like image 23
supercat Avatar answered Sep 22 '22 14:09

supercat