When doing pointer arithmetic with offsetof
, is it well defined behavior to take the address of a struct, add the offset of a member to it, and then dereference that address to get to the underlying member?
Consider the following example:
#include <stddef.h>
#include <stdio.h>
typedef struct {
const char* a;
const char* b;
} A;
int main() {
A test[3] = {
{.a = "Hello", .b = "there."},
{.a = "How are", .b = "you?"},
{.a = "I\'m", .b = "fine."}};
for (size_t i = 0; i < 3; ++i) {
char* ptr = (char*) &test[i];
ptr += offsetof(A, b);
printf("%s\n", *(char**)ptr);
}
}
This should print "there.", "you?" and "fine." on three consecutive lines, which it currently does with both clang and gcc, as you can verify yourself on wandbox. However, I am unsure whether any of these pointer casts and arithmetic violate some rule which would cause the behavior to become undefined.
As far as I can tell, it is well-defined behavior. But only because you access the data through a char
type. If you had used some other pointer type to access the struct, it would have been a "strict aliasing violation".
Strictly speaking, it is not well-defined to access an array out-of-bounds, but it is well-defined to use a character type pointer to grab any byte out of a struct. By using offsetof
you guarantee that this byte is not a padding byte (which could have meant that you would get an indeterminate value).
Note however, that casting away the const
qualifier does result in poorly-defined behavior.
EDIT
Similarly, the cast (char**)ptr
is an invalid pointer conversion - this alone is undefined behavior as it violates strict aliasing. The variable ptr
itself was declared as a char*
, so you can't lie to the compiler and say "hey, this is actually a char**
", because it is not. This is regardless of what ptr
points at.
I believe that the correct code with no poorly-defined behavior would be this:
#include <stddef.h>
#include <stdio.h>
#include <string.h>
typedef struct {
const char* a;
const char* b;
} A;
int main() {
A test[3] = {
{.a = "Hello", .b = "there."},
{.a = "How are", .b = "you?"},
{.a = "I\'m", .b = "fine."}};
for (size_t i = 0; i < 3; ++i) {
const char* ptr = (const char*) &test[i];
ptr += offsetof(A, b);
/* Extract the const char* from the address that ptr points at,
and store it inside ptr itself: */
memmove(&ptr, ptr, sizeof(const char*));
printf("%s\n", ptr);
}
}
Given
struct foo {int x, y;} s;
void write_int(int *p, int value) { *p = value; }
nothing in the Standard would distinguish between:
write_int(&s.y, 12); //Just to get 6 characters
and
write_int((int*)(((char*)&s)+offsetof(struct foo,y)), 12);
The Standard could be read in such a way as to imply that both of the above violate the lvalue-type rules since it does not specify that the stored value of a structure can be accessed using an lvalue of a member type, requiring that code wanting to access as structure member be written as:
void write_int(int *p, int value) { memcpy(p, value, sizeof value); }
I personally think that's preposterous; if &s.y
can't be used to access an
lvalue of type int
, why does the &
operator yield an int*
?
On the other hand, I also don't think it matters what the Standard says. Neither clang nor gcc can be relied upon to correctly handle code that does anything "interesting" with pointers, even in cases that are unambiguously defined by the Standard, except when invoked with -fno-strict-aliasing
. Compilers that make any bona fide effort to avoid any incorrect aliasing "optimizations" in cases which would be defined under at least some plausible readings of the Standard will have no trouble handling code that uses offsetof
in cases where all accesses that will be done using the pointer (or other pointers derived from it) precede the next access to the object via other means.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With