Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Allowing struct field to overflow to the next field

Consider the following simple example:

struct __attribute__ ((__packed__)) {
 int code[1];
 int place_holder[100];
} s;

void test(int n)
{
 int i;

 for (i = 0; i < n; i++) {
  s.code[i] = 1;
 }
}

The for-loop is writing to the field code, which is of size 1. The next field after code is place_holder.
I would expect that in case of n > 1, the write to code array would overflow and 1 would be written to place_holder.

However, when compiling with -O2 (on gcc 4.9.4 but probably on other versions as well) something interesting happens.
The compiler identifies that the code might overflow array code, and limits loop unrolling to 1 iteration.

It's easy to see that when compiling with -fdump-tree-all and looking at the last tree pass ("t.optimized"):


;; Function test (test, funcdef_no=0, decl_uid=1366, symbol_order=1)

Removing basic block 5
test (int n)
{
  <bb 2>:
  # DEBUG i => 0
  # DEBUG i => 0
  if (n_4(D) > 0)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  s.code[0] = 1;
  # DEBUG i => 1
  # DEBUG i => 1

  <bb 4>:
  return;

}

So in this case the compiler completely unrolled the loop to a single iteration.

My questions are:

  1. From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?
    Let's assume I'm aware of the struct layout in memory and know what I'm doing when deliberately overflowing the code array.
  2. Is there a way to prevent gcc from unrolling the loop in such case? I know I can completely prevent loop unrolling, however I'm still interested in loop unrolling on other cases. I also suspect that the analysis the compiler is doing might affect passes other than loop unrolling.
    gcc is assuming I'm not going to overflow when accessing my array, so what I'm really looking for is way to tell the compiler not to take this assumption (by providing some compiler option).

I'm aware it's a bad practice to write such code that overflows from one field to another, and I'm not intending to write such code.
I'm also aware of the practice to put an array (possibly zero sized) as the last struct field to allow it to overflow, this is well supported by compilers, while in this case the array code is not the last field.
So this is not a question of "how to fix the code", but rather a question of understanding the compiler assumptions and affecting them.

These questions came up when I observed existing code that was already written in such way, and debugged it to find out why it's not behaving as the original developer expected it to behave.
The risk is that there are other places in the code where such problem exists. Static analysis tools can help to find out, but I would also like to know if there's a way to make the compiler tolerate such code and still generate the result we would expect.

Update

I got clear answer to question (1) above, but not for question (2).

  • Can gcc allow this as an extension, by some compile options?
  • Is there a way to at least get a warning when gcc identifies it? (and it clearly identifies it, by optimizing things out).
    That's important in order to identify such cases in a large existing code base.
like image 834
Amir Gonnen Avatar asked Jul 02 '20 08:07

Amir Gonnen


3 Answers

From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?

It is undefined behavior. The arr[i] operator is syntactic sugar around *(arr + i). So array access boils down to the binary + operator for pointer arithmetic, C17 6.5.6 additive operators, from §7 and §8:

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. /--/
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

As you noticed, optimizing compilers might exploit these rules to produce faster code.


Is there a way to prevent gcc from unrolling the loop in such case?

There is a a special exception rule that can be used, C17 6.3.2.3/7:

When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

Also, strict aliasing does not apply to character types, because of another special rule in C17 6.5 §7

An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ... a character type.

These two special rules co-exist in harmony. So assuming we don't mess up alignment etc during the pointer conversion, this means that we are allowed to do this:

unsigned char* i;
for(i = (unsigned char*)&mystruct; i < (unsigned char*)(&mystruct + 1); i++)
{
  do_something(*i);
}

This may however read padding bytes etc so it's "implementation-defined". But in theory you can access the struct byte per byte, and as long as the struct offsets are calculated on byte-per-byte basis, you can iterate across multiple members of the struct (or any other object) in this manner.


As far as I can tell, this very questionable-looking code should be well-defined:

#include <stdint.h>
#include <string.h>
#include <stdio.h>

struct __attribute__ ((__packed__)) {
 int code[1];
 int place_holder[100];
} s;

void test(int val, int n)
{
  for (unsigned char* i = (unsigned char*)&s; 
       i < (unsigned char*)&s + n*sizeof(int); 
       i += _Alignof(int)) 
  {
    if((uintptr_t)i % _Alignof(int) == 0) // not really necessary, just defensive prog.
    {
      memcpy(i, &val, sizeof(int));
      printf("Writing %d to address %p\n", val, (void*)i);
    }
  }
}

int main (void)
{
  test(42, 3);
  printf("%d %d %d\n", s.code[0], s.place_holder[0], s.place_holder[1]);
}

This works fine on gcc and clang (x86). How efficient it is, well that's another story. Please don't write code like this, though.

like image 83
Lundin Avatar answered Oct 11 '22 01:10

Lundin


From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?

It's undefined behavior to access an array out-of-bounds. From C11 J.2:

The behavior is undefined in the following circumstances:

[...]

An array subscript is out of range [...]

Is there a way to prevent gcc from unrolling the loop in such case?

Alias code with a volatile pointer. But even using an intermediary pointer seems to work. godbolt link

like image 1
KamilCuk Avatar answered Oct 11 '22 00:10

KamilCuk


Just _Static_assert the layout and do the pointer arithmetic in (char*), then cast to (int*) and do the access. No further tricks such as memcpy/_Alignof are required because ints are unpadded and you are accessing ints where there really are ints.

This alone makes gcc unroll the loop.

Why character-pointer based (char*, signed char*, unsigned char*) pointer arithmetic is required is because http://port70.net/~nsz/c/c11/n1570.html#J.2 (non-normatively, as it is just an appendix, but gcc seems to follow it) makes out-of bounds accesses UB, but http://port70.net/~nsz/c/c99/n1256.html#6.2.6.1p4 and http://port70.net/~nsz/c/c99/n1256.html#6.5p6 still allow inspecting any object via character pointers (more discussion on this at Is accessing an element of a multidimensional array out of bounds undefined behavior?).

Alternatively you could do the pointer arithmetic via uintptr_t (then it will be implementation defined) but gcc optimizes those worse in certain cases (gcc doesn't fold (uintptr_t)p < (uintptr_t)(p+10) into true, but it does so for (char*)p < (char*)(p+10). This could be considered a missed optimization).

struct  __attribute__ ((__packed__)) s {
    int code[1];
    int place_holder[100];
} s;


void test_s(int n) //original
{
    int i;
    for (i = 0; i < n; i++) {
        s.code[i] = 1;
    }
}

#include <stddef.h> //offsetof
void test_s2(int n) //unrolls the loop
{
    _Static_assert(offsetof(struct s,code)+sizeof(int)==offsetof(struct s,place_holder),"");
    //^will practically hold even without __attribute__((__packed__))

    int i; for (i = 0; i < n; i++)
        *(int*)((char*)&s.code + (size_t)i*sizeof(s.code[0])) = 1;
}

/////////////


//same code as test_s2
struct r {
    int code101[101];
} r;
void test_r(int n)
{
    int i;

    for (i = 0; i < n; i++) {
        r.code101[i] = 1;
    }
}
like image 1
PSkocik Avatar answered Oct 11 '22 00:10

PSkocik