Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is accessing members through offsetof well defined?

When doing pointer arithmetic with offsetof, is it well defined behavior to take the address of a struct, add the offset of a member to it, and then dereference that address to get to the underlying member?

Consider the following example:

#include <stddef.h>
#include <stdio.h>

typedef struct {
    const char* a;
    const char* b;
} A;

int main() {
    A test[3] = {
        {.a = "Hello", .b = "there."},
        {.a = "How are", .b = "you?"},
        {.a = "I\'m", .b = "fine."}};

    for (size_t i = 0; i < 3; ++i) {
        char* ptr = (char*) &test[i];
        ptr += offsetof(A, b);
        printf("%s\n", *(char**)ptr);
    }
}

This should print "there.", "you?" and "fine." on three consecutive lines, which it currently does with both clang and gcc, as you can verify yourself on wandbox. However, I am unsure whether any of these pointer casts and arithmetic violate some rule which would cause the behavior to become undefined.

like image 607
Ben Steffan Avatar asked Oct 02 '17 10:10

Ben Steffan


2 Answers

As far as I can tell, it is well-defined behavior. But only because you access the data through a char type. If you had used some other pointer type to access the struct, it would have been a "strict aliasing violation".

Strictly speaking, it is not well-defined to access an array out-of-bounds, but it is well-defined to use a character type pointer to grab any byte out of a struct. By using offsetof you guarantee that this byte is not a padding byte (which could have meant that you would get an indeterminate value).

Note however, that casting away the const qualifier does result in poorly-defined behavior.

EDIT

Similarly, the cast (char**)ptr is an invalid pointer conversion - this alone is undefined behavior as it violates strict aliasing. The variable ptr itself was declared as a char*, so you can't lie to the compiler and say "hey, this is actually a char**", because it is not. This is regardless of what ptr points at.

I believe that the correct code with no poorly-defined behavior would be this:

#include <stddef.h>
#include <stdio.h>
#include <string.h>

typedef struct {
    const char* a;
    const char* b;
} A;

int main() {
    A test[3] = {
        {.a = "Hello", .b = "there."},
        {.a = "How are", .b = "you?"},
        {.a = "I\'m", .b = "fine."}};

    for (size_t i = 0; i < 3; ++i) {
        const char* ptr = (const char*) &test[i];
        ptr += offsetof(A, b);

        /* Extract the const char* from the address that ptr points at,
           and store it inside ptr itself: */
        memmove(&ptr, ptr, sizeof(const char*)); 
        printf("%s\n", ptr);
    }
}
like image 51
Lundin Avatar answered Nov 19 '22 05:11

Lundin


Given

struct foo {int x, y;} s;
void write_int(int *p, int value) { *p = value; }

nothing in the Standard would distinguish between:

write_int(&s.y, 12); //Just to get 6 characters

and

write_int((int*)(((char*)&s)+offsetof(struct foo,y)), 12);

The Standard could be read in such a way as to imply that both of the above violate the lvalue-type rules since it does not specify that the stored value of a structure can be accessed using an lvalue of a member type, requiring that code wanting to access as structure member be written as:

void write_int(int *p, int value) { memcpy(p, value, sizeof value); }

I personally think that's preposterous; if &s.y can't be used to access an lvalue of type int, why does the & operator yield an int*?

On the other hand, I also don't think it matters what the Standard says. Neither clang nor gcc can be relied upon to correctly handle code that does anything "interesting" with pointers, even in cases that are unambiguously defined by the Standard, except when invoked with -fno-strict-aliasing. Compilers that make any bona fide effort to avoid any incorrect aliasing "optimizations" in cases which would be defined under at least some plausible readings of the Standard will have no trouble handling code that uses offsetof in cases where all accesses that will be done using the pointer (or other pointers derived from it) precede the next access to the object via other means.

like image 21
supercat Avatar answered Nov 19 '22 06:11

supercat