Consider the following simple example:
struct __attribute__ ((__packed__)) {
int code[1];
int place_holder[100];
} s;
void test(int n)
{
int i;
for (i = 0; i < n; i++) {
s.code[i] = 1;
}
}
The for-loop is writing to the field code
, which is of size 1. The next field after code
is place_holder
.
I would expect that in case of n > 1
, the write to code
array would overflow and 1
would be written to place_holder
.
However, when compiling with -O2
(on gcc 4.9.4 but probably on other versions as well) something interesting happens.
The compiler identifies that the code might overflow array code
, and limits loop unrolling to 1 iteration.
It's easy to see that when compiling with -fdump-tree-all
and looking at the last tree pass ("t.optimized"):
;; Function test (test, funcdef_no=0, decl_uid=1366, symbol_order=1)
Removing basic block 5
test (int n)
{
<bb 2>:
# DEBUG i => 0
# DEBUG i => 0
if (n_4(D) > 0)
goto <bb 3>;
else
goto <bb 4>;
<bb 3>:
s.code[0] = 1;
# DEBUG i => 1
# DEBUG i => 1
<bb 4>:
return;
}
So in this case the compiler completely unrolled the loop to a single iteration.
code
array.I'm aware it's a bad practice to write such code that overflows from one field to another, and I'm not intending to write such code.
I'm also aware of the practice to put an array (possibly zero sized) as the last struct field to allow it to overflow, this is well supported by compilers, while in this case the array code
is not the last field.
So this is not a question of "how to fix the code", but rather a question of understanding the compiler assumptions and affecting them.
These questions came up when I observed existing code that was already written in such way, and debugged it to find out why it's not behaving as the original developer expected it to behave.
The risk is that there are other places in the code where such problem exists. Static analysis tools can help to find out, but I would also like to know if there's a way to make the compiler tolerate such code and still generate the result we would expect.
I got clear answer to question (1) above, but not for question (2).
From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?
It is undefined behavior. The arr[i]
operator is syntactic sugar around *(arr + i)
. So array access boils down to the binary +
operator for pointer arithmetic, C17 6.5.6 additive operators, from §7 and §8:
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. /--/
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
As you noticed, optimizing compilers might exploit these rules to produce faster code.
Is there a way to prevent gcc from unrolling the loop in such case?
There is a a special exception rule that can be used, C17 6.3.2.3/7:
When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
Also, strict aliasing does not apply to character types, because of another special rule in C17 6.5 §7
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ... a character type.
These two special rules co-exist in harmony. So assuming we don't mess up alignment etc during the pointer conversion, this means that we are allowed to do this:
unsigned char* i;
for(i = (unsigned char*)&mystruct; i < (unsigned char*)(&mystruct + 1); i++)
{
do_something(*i);
}
This may however read padding bytes etc so it's "implementation-defined". But in theory you can access the struct byte per byte, and as long as the struct offsets are calculated on byte-per-byte basis, you can iterate across multiple members of the struct (or any other object) in this manner.
As far as I can tell, this very questionable-looking code should be well-defined:
#include <stdint.h>
#include <string.h>
#include <stdio.h>
struct __attribute__ ((__packed__)) {
int code[1];
int place_holder[100];
} s;
void test(int val, int n)
{
for (unsigned char* i = (unsigned char*)&s;
i < (unsigned char*)&s + n*sizeof(int);
i += _Alignof(int))
{
if((uintptr_t)i % _Alignof(int) == 0) // not really necessary, just defensive prog.
{
memcpy(i, &val, sizeof(int));
printf("Writing %d to address %p\n", val, (void*)i);
}
}
}
int main (void)
{
test(42, 3);
printf("%d %d %d\n", s.code[0], s.place_holder[0], s.place_holder[1]);
}
This works fine on gcc and clang (x86). How efficient it is, well that's another story. Please don't write code like this, though.
From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?
It's undefined behavior to access an array out-of-bounds. From C11 J.2:
The behavior is undefined in the following circumstances:
[...]
An array subscript is out of range [...]
Is there a way to prevent gcc from unrolling the loop in such case?
Alias code
with a volatile
pointer. But even using an intermediary pointer seems to work. godbolt link
Just _Static_assert
the layout and do the pointer arithmetic in (char*)
, then cast to (int*)
and
do the access. No further tricks such as memcpy
/_Alignof
are required because int
s are unpadded
and you are accessing int
s where there really are int
s.
This alone makes gcc unroll the loop.
Why character-pointer based (char*
, signed char*
, unsigned char*
) pointer arithmetic is required is because
http://port70.net/~nsz/c/c11/n1570.html#J.2 (non-normatively, as it is just an appendix, but gcc seems to follow it) makes out-of bounds accesses UB,
but http://port70.net/~nsz/c/c99/n1256.html#6.2.6.1p4 and http://port70.net/~nsz/c/c99/n1256.html#6.5p6 still allow inspecting any object via character pointers (more discussion on this at Is accessing an element of a multidimensional array out of bounds undefined behavior?).
Alternatively you could do the pointer arithmetic via uintptr_t
(then it will be implementation defined)
but gcc optimizes those worse in certain cases (gcc doesn't fold (uintptr_t)p < (uintptr_t)(p+10)
into true
, but it does so for (char*)p < (char*)(p+10)
. This could be considered a missed optimization).
struct __attribute__ ((__packed__)) s {
int code[1];
int place_holder[100];
} s;
void test_s(int n) //original
{
int i;
for (i = 0; i < n; i++) {
s.code[i] = 1;
}
}
#include <stddef.h> //offsetof
void test_s2(int n) //unrolls the loop
{
_Static_assert(offsetof(struct s,code)+sizeof(int)==offsetof(struct s,place_holder),"");
//^will practically hold even without __attribute__((__packed__))
int i; for (i = 0; i < n; i++)
*(int*)((char*)&s.code + (size_t)i*sizeof(s.code[0])) = 1;
}
/////////////
//same code as test_s2
struct r {
int code101[101];
} r;
void test_r(int n)
{
int i;
for (i = 0; i < n; i++) {
r.code101[i] = 1;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With