Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using offsetof to access struct member

I have the following code:

#include <stddef.h>

int main() {
  struct X {
    int a;
    int b;
  } x = {0, 0};

  void *ptr = (char*)&x + offsetof(struct X, b);

  *(int*)ptr = 42;

  return 0;
}

The last line performs indirect access to x.b.

Is this code defined according to any of C standards?

I know that:

  • *(char*)ptr = 42; is defined though only implementation defined.
  • ptr == (void*)&x.b

I guess that accessing data pointed by ptr via int*does not violate strict aliasing rule but I'm not fully sure that the standard guarantees that.

like image 625
tstanisl Avatar asked Oct 29 '21 16:10

tstanisl


2 Answers

Yes, this is perfectly well defined, and is exactly how offsetof is intended to be used. You do the pointer arithmetic on a pointer to character type, so that it is done in bytes, and then cast back to the actual type of the member.

You can see for instance 6.3.2.3 p7 (all references are to C17 draft N2176):

When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

So (char *)&x is a pointer to x converted to a pointer to char, therefore it points to the lowest addressed byte of x. When we add offsetof(struct X, b) (say it's 4) then we have a pointer to byte 4 of x. Now offsetof(struct X, b) is defined to return

the offset in bytes, to the structure member, from the beginning of its structure [7.19p3]

so 4 is in fact the offset from the beginning of x to x.b. Hence byte 4 of x is the lowest byte of x.b, and that's what ptr points to; in other words, ptr is a pointer to x.b, but of type char *. When we cast it back to int *, we have a pointer to x.b which is of the type int *, exactly the same as we would get from the expression &x.b. So dereferencing this pointer accesses x.b.


A question arose in the comments about this last step: when ptr is cast back to int *, how do we know we indeed have a pointer to the int x.b? This is a bit less explicit in the standard but I think it is the obvious intent.

However, I think we can also derive it indirectly. Hopefully we agree that ptr above is a pointer to the lowest addressed byte of x.b. Now by the same passage of 6.3.2.3 p7 quoted above, taking a pointer to x.b and converting it to char *, as in (char *)&x.b, would also yield a pointer to the lowest addressed byte of x.b. As they are pointers of the same type which point to the same byte, they are the same pointer: ptr == (char *)&x.b.

Then we look at the preceding sentences of 6.3.2.3 p7:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.

There are no problems with alignment here, because char has the weakest alignment requirement (6.2.8 p6). So converting (char *)&x.b back to int * must recover a pointer to x.b, i.e. (int *)(char *)&x.b == &x.b.

But ptr is the same pointer as (char *)&x.b, so we may substitute them in this equality: (int *)ptr == &x.b.

Obviously *&x.b produces an lvalue designating x.b (6.5.3.2 p4), hence so does *(int *)ptr.


There is no problem with strict aliasing (6.5p7). First, determine the effective type of x.b using 6.5p6:

The effective type of an object for an access to its stored value is the declared type of the object, if any. [Then explanations on what to do if it doesn't have a declared type.]

Well, x.b does have a declared type, which is int. So its effective type is int.

Now to see if the access is legal under strict aliasing, see 6.5p7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

— a type compatible with the effective type of the object,

[more options not relevant here]

We are accessing x.b through the lvalue expression *(int *)ptr, which has type int. And int is compatible with int per 6.2.7p1:

Two types have compatible type if their types are the same. [Then other conditions under which they may also be compatible].


An example of this same technique that maybe is more familiar is indexing into an array by bytes. If we have

int arr[100];
*(int *)((char *)arr + (17 * sizeof(int))) = 42;

then this is equivalent to arr[17] = 42;.

This is how generic routines like qsort and bsearch are implemented. If we try to qsort an array of int, then within qsort all the pointer arithmetic is done in bytes, on pointers to character type with the offsets manually scaled by the object size passed as an argument (which here would be sizeof(int)). When qsort needs to compare two objects, it casts them to const void * and passes them as arguments to the comparator function, which casts them back to const int * to do the comparison.

This all works fine and is clearly an intended feature of the language. So I think we needn't doubt that the use of offsetof in the current question is similarly an intended feature.

like image 62
Nate Eldredge Avatar answered Oct 07 '22 12:10

Nate Eldredge


I believe that this is perfectly legal; in fact, I've just encountered a similar technique used in a book I'm reading (not that it matters).

Here's why I think this is legal:

void *ptr = (char*)&x + offsetof(struct X, b);

First, x was dereferenced into a pointer to struct, but if we use its raw type for pointer arithmetic, every time we increase &x by 1 the value actually increases an amount equal to sizeof(struct X). Since offsetof returns a value which is a distance in bytes from the beginning of the struct, we need to convert &x into a compatible pointer to a byte-sized type, in this case char *. Since a char is always defined to be 1 byte, when we increase a char * by 1 we will advance 1 byte. This is why it is specifically called out in Section 6.5 Expressions:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

The result of this is now a pointer to the start of x.b in the type of char *, and it is perfectly aligned, therefore no undefined behavior invoked here. Why? because offsetof returns a distance in bytes from the beginning, and we have been doing byte-wise arithmetic on the pointer through the char * cast, the result should be pointing at exactly the beginning of b.

Since we've reached the start of the object we want, we don't need the result to be in the type char * anymore. The result will be casted to a generic pointer void * ptr now, to be cast into int * later before dereferencing it to give us access to x.b.

Since b is an int, and we in the end have a *(int*) which evaluates to an int type, we are following the standard under the "a type compatible with the effective type of the object" clause above (or one of the other ones; please correct me if I'm wrong).

like image 2
stanle Avatar answered Oct 07 '22 10:10

stanle