I have the following code:
#include <stddef.h>
int main() {
struct X {
int a;
int b;
} x = {0, 0};
void *ptr = (char*)&x + offsetof(struct X, b);
*(int*)ptr = 42;
return 0;
}
The last line performs indirect access to x.b
.
Is this code defined according to any of C standards?
I know that:
*(char*)ptr = 42;
is defined though only implementation defined.ptr == (void*)&x.b
I guess that accessing data pointed by ptr
via int*
does not violate strict aliasing rule but I'm not fully sure that the standard guarantees that.
Yes, this is perfectly well defined, and is exactly how offsetof
is intended to be used. You do the pointer arithmetic on a pointer to character type, so that it is done in bytes, and then cast back to the actual type of the member.
You can see for instance 6.3.2.3 p7 (all references are to C17 draft N2176):
When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
So (char *)&x
is a pointer to x
converted to a pointer to char
, therefore it points to the lowest addressed byte of x
. When we add offsetof(struct X, b)
(say it's 4) then we have a pointer to byte 4 of x
. Now offsetof(struct X, b)
is defined to return
the offset in bytes, to the structure member, from the beginning of its structure [7.19p3]
so 4 is in fact the offset from the beginning of x
to x.b
. Hence byte 4 of x
is the lowest byte of x.b
, and that's what ptr
points to; in other words, ptr
is a pointer to x.b
, but of type char *
. When we cast it back to int *
, we have a pointer to x.b
which is of the type int *
, exactly the same as we would get from the expression &x.b
. So dereferencing this pointer accesses x.b
.
A question arose in the comments about this last step: when ptr
is cast back to int *
, how do we know we indeed have a pointer to the int
x.b
? This is a bit less explicit in the standard but I think it is the obvious intent.
However, I think we can also derive it indirectly. Hopefully we agree that ptr
above is a pointer to the lowest addressed byte of x.b
. Now by the same passage of 6.3.2.3 p7 quoted above, taking a pointer to x.b
and converting it to char *
, as in (char *)&x.b
, would also yield a pointer to the lowest addressed byte of x.b
. As they are pointers of the same type which point to the same byte, they are the same pointer: ptr == (char *)&x.b
.
Then we look at the preceding sentences of 6.3.2.3 p7:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.
There are no problems with alignment here, because char
has the weakest alignment requirement (6.2.8 p6). So converting (char *)&x.b
back to int *
must recover a pointer to x.b
, i.e. (int *)(char *)&x.b == &x.b
.
But ptr
is the same pointer as (char *)&x.b
, so we may substitute them in this equality: (int *)ptr == &x.b
.
Obviously *&x.b
produces an lvalue designating x.b
(6.5.3.2 p4), hence so does *(int *)ptr
.
There is no problem with strict aliasing (6.5p7). First, determine the effective type of x.b
using 6.5p6:
The effective type of an object for an access to its stored value is the declared type of the object, if any. [Then explanations on what to do if it doesn't have a declared type.]
Well, x.b
does have a declared type, which is int
. So its effective type is int
.
Now to see if the access is legal under strict aliasing, see 6.5p7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
[more options not relevant here]
We are accessing x.b
through the lvalue expression *(int *)ptr
, which has type int
. And int
is compatible with int
per 6.2.7p1:
Two types have compatible type if their types are the same. [Then other conditions under which they may also be compatible].
An example of this same technique that maybe is more familiar is indexing into an array by bytes. If we have
int arr[100];
*(int *)((char *)arr + (17 * sizeof(int))) = 42;
then this is equivalent to arr[17] = 42;
.
This is how generic routines like qsort
and bsearch
are implemented. If we try to qsort
an array of int
, then within qsort
all the pointer arithmetic is done in bytes, on pointers to character type with the offsets manually scaled by the object size passed as an argument (which here would be sizeof(int)
). When qsort
needs to compare two objects, it casts them to const void *
and passes them as arguments to the comparator function, which casts them back to const int *
to do the comparison.
This all works fine and is clearly an intended feature of the language. So I think we needn't doubt that the use of offsetof
in the current question is similarly an intended feature.
I believe that this is perfectly legal; in fact, I've just encountered a similar technique used in a book I'm reading (not that it matters).
Here's why I think this is legal:
void *ptr = (char*)&x + offsetof(struct X, b);
First, x
was dereferenced into a pointer to struct, but if we use its raw type for pointer arithmetic, every time we increase &x
by 1 the value actually increases an amount equal to sizeof(struct X)
. Since offsetof
returns a value which is a distance in bytes from the beginning of the struct, we need to convert &x
into a compatible pointer to a byte-sized type, in this case char *
. Since a char
is always defined to be 1 byte, when we increase a char *
by 1 we will advance 1 byte. This is why it is specifically called out in Section 6.5 Expressions:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
The result of this is now a pointer to the start of x.b
in the type of char *
, and it is perfectly aligned, therefore no undefined behavior invoked here. Why? because offsetof
returns a distance in bytes from the beginning, and we have been doing byte-wise arithmetic on the pointer through the char *
cast, the result should be pointing at exactly the beginning of b
.
Since we've reached the start of the object we want, we don't need the result to be in the type char *
anymore. The result will be casted to a generic pointer void * ptr
now, to be cast into int *
later before dereferencing it to give us access to x.b
.
Since b
is an int
, and we in the end have a *(int*)
which evaluates to an int
type, we are following the standard under the "a type compatible with the effective type of the object"
clause above (or one of the other ones; please correct me if I'm wrong).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With