Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it undefined behaviour to call a function with pointers to different elements of a union as arguments?

This code prints different values after compiling with -O1 and -O2 (both gcc and clang):

#include <stdio.h>

static void check (int *h, long *k)
{
  *h = 5;
  *k = 6;
  printf("%d\n", *h);
}

union MyU
{
    long l;
    int i;
};

int main (void)
{
  union MyU u;
  check(&u.i, &u.l);
  return 0;
}

I think it should be undefined behavior, because of the pointer aliasing, but I cannot pinpoint exactly what part of the code is forbidden.

It does write to one union element and then read from the other, but according to Defect Report #283 that is allowed. Is it UB when the union elements are accessed through pointers rather than directly?

This question is similar to Accessing C union members via pointers, but I think that one was never fully answered.

like image 827
Tor Klingberg Avatar asked Apr 06 '14 16:04

Tor Klingberg


People also ask

What is undefined behavior in programming?

So, in C/C++ programming, undefined behavior means when the program fails to compile, or it may execute incorrectly, either crashes or generates incorrect results, or when it may fortuitously do exactly what the programmer intended.

What causes undefined Behaviour in C?

In C the use of any automatic variable before it has been initialized yields undefined behavior, as does integer division by zero, signed integer overflow, indexing an array outside of its defined bounds (see buffer overflow), or null pointer dereferencing.

How can a function pointer be an argument?

Pass-by-pointer means to pass a pointer argument in the calling function to the corresponding formal parameter of the called function. The called function can modify the value of the variable to which the pointer argument points. When you use pass-by-pointer, a copy of the pointer is passed to the function.

Can we have a pointer to a function?

A pointer to a function points to the address of the executable code of the function. You can use pointers to call functions and to pass functions as arguments to other functions. You cannot perform pointer arithmetic on pointers to functions.


2 Answers

It took me a while to realize what the crux of the issue is here. DR236 discusses it. The issue is actually about passing pointers to a function which point to overlapping storage; and whether the compiler is allowed to assume that such pointers may alias each other or not.

If we are just discussing aliasing of union members then it would be simpler. In the following code:

u.i = 5;
u.l = 6;
printf("%d\n", u.i);

the behaviour is undefined because the effective type of u is long; i.e. the storage of u contains a value that was stored as a long. But accessing these bytes via an lvalue of type int violates the aliasing rules of 6.5p7. The text about inactive union members having unspecified values does not apply (IMO); the aliasing rules trump that, and that text comes into play when aliasing rules are not violated, for example, when accessed via an lvalue of character type.

If we exchange the order of the first two lines above then the program would be well-defined.

However, things all seem to change when the accesses are "hidden" behind pointers to a function.

The DR236 addresses this via two examples. Both examples have check() as in this post. Example 1 mallocs some memory and passes h and k both pointing to the start of that block. Example 2 has a union similar to this post.

Their conclusion is that Example 1 is "unresolved", and Example 2 is UB. However, this excellent blog post points out that the logic used by DR236 in reaching these conclusions is inconsistent. (Thanks to Tor Klingberg for finding this).

The last line of DR236 also says:

Both programs invoke undefined behavior, by calling function f with pointers qi and qd that have different types but designate the same region of storage. The translator has every right to rearrange accesses to *qi and *qd by the usual aliasing rules.

(apparently in contradiction of the earlier claim that Example 1 was unresolved).

This quote suggests that the compiler is allowed to assume that two pointers passed to a function are restrict if they have different types, however I cannot find any wording in the Standard to this effect, or even addressing the issue of the compiler re-ordering accesses through pointers.

It has been suggested that the aliasing rules allow the compiler to conclude that an int * and a long * cannot access the same memory. However, Examples 1 and 2 flatly contradict this.

If the pointers had the same type, then I think we agree that the compiler cannot reorder the accesses, because they might both point to the same object. The compiler has to assume the pointers are not restrict unless specifically declared as such.

Yet, I fail to see the difference between this case, and the cases of Example 1 and 2.

DR236 also says:

Common understanding is that the union declaration must be visible in the translation unit.

which again contradicts the claim that Example 2 is UB, because in Example 2 all of the code is in the same translation unit.

My conclusion: it seems to me that the C99 wording indicates that the compiler should not be allowed to re-order *h = 5; and *k = 6; in case they alias overlapping storage. Notwithstanding the fact that the DR236 contradicts the C99 wording and does not clarify matters. But reading *h after that should cause undefined behaviour, so the compiler is allowed to generate output of 5 or 6 , or anything else.

In my reading, if you modify check() to be *k = 6; *h=5; then it should be well-defined to print 5. It'd be interesting to see whether a compiler still does something else in this case, and also the compiler's rationale if it does.

like image 56
M.M Avatar answered Sep 26 '22 03:09

M.M


The relevant quote from the standard is the relevant aliasing rules which are violated. Violation of a normative shall always results in Undefined Behavior, so everything goes:

6.5 Expressions §7
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

While main() does use a union, check() does not.

like image 34
Deduplicator Avatar answered Sep 25 '22 03:09

Deduplicator