Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pointer difference across members of a struct?

The C99 standard states that:

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object

Consider the following code:

struct test {
    int x[5];
    char something;
    short y[5];
};

...

struct test s = { ... };
char *p = (char *) s.x;
char *q = (char *) s.y;
printf("%td\n", q - p);

This obviously breaks the above rule, since the p and q pointers are pointing to different "array objects", and, according to the rule, the q - p difference is undefined.

But in practice, why should such a thing ever result in undefined behaviour? After all, the struct members are laid out sequentially (just as array elements are), with any potential padding between the members. True, the amount of padding will vary across implementations and that would affect the outcome of the calculations, but why should that result be "undefined"?

My question is, can we suppose that the standard is just "ignorant" of this issue, or is there a good reason for not broadening this rule? Couldn't the above rule be rephrased to "both shall point to elements of the same array object or members of the same struct"?

My only suspicion are segmented memory architectures where the members might end up in different segments. Is that the case?

I also suspect that this is the reason why GCC defines its own __builtin_offsetof, in order to have a "standards compliant" definition of the offsetof macro.

EDIT:

As already pointed out, arithmetic on void pointers is not allowed by the standard. It is a GNU extension that fires a warning only when GCC is passed -std=c99 -pedantic. I'm replacing the void * pointers with char * pointers.

like image 355
Blagovest Buyukliev Avatar asked Nov 03 '14 09:11

Blagovest Buyukliev


4 Answers

Subtraction and relational operators (on type char*) between addresses of member of the same struct are well defined.

Any object can be treated as an array of unsigned char.

Quoting N1570 6.2.6.1 paragraph 4:

Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [ n ] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

...

My only suspicion are segmented memory architectures where the members might end up in different segments. Is that the case?

No. For a system with a segmented memory architecture, normally the compiler will impose a restriction that each object must fit into a single segment. Or it can permit objects that occupy multiple segments, but it still has to ensure that pointer arithmetic and comparisons work correctly.

like image 110
Keith Thompson Avatar answered Sep 21 '22 11:09

Keith Thompson


Pointer arithmetic requires that the two pointers being added or subtracted to be part of the same object because it doesn't make sense otherwise. The quoted section of standard specifically refers to two unrelated objects such as int a[b]; and int b[5]. The pointer arithmetic requires to know the type of the object that the pointers pointing to (I am sure you are aware of this already).

i.e.

int a[5];
int *p = &a[1]+1; 

Here p is calculated by knowing the that the &a[1] refers to an int object and hence incremented to 4 bytes (assuming sizeof(int) is 4).

Coming to the struct example, I don't think it can possibly be defined in a way to make pointer arithmetic between struct members legal.

Let's take the example,

struct test {
    int x[5];
    char something;
    short y[5];
};

Pointer arithmatic is not allowed with void pointers by C standard (Compiling with gcc -Wall -pedantic test.c would catch that). I think you are using gcc which assumes void* is similar to char* and allows it. So,

printf("%zu\n", q - p);

is equivalent to

printf("%zu", (char*)q - (char*)p);

as pointer arithmetic is well defined if the pointers point to within the same object and are character pointers (char* or unsigned char*).

Using correct types, it would be:

struct test s = { ... };
int *p = s.x;
short *q = s.y;
printf("%td\n", q - p);

Now, how can q-p be performed? based on sizeof(int) or sizeof(short) ? How can the size of char something; that's in the middle of these two arrays be calculated?

That should explain it's not possible to perform pointer arithmetic on objects of different types.

Even if all members are of same type (thus no type issue as stated above), then it's better to use the standard macro offsetof (from <stddef.h>) to get the difference between struct members which has the similar effect as pointer arithmetic between members:

printf("%zu\n", offsetof(struct test, y) - offsetof(struct test, x));

So I see no necessity to define pointer arithmetic between struct members by the C standard.

like image 34
P.P Avatar answered Sep 21 '22 11:09

P.P


Yes, you are allowed to perform pointer arithmetric on structure bytes:

N1570 - 6.3.2.3 Pointers p7:

... When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

This means that for the programmer, bytes of the stucture shall be seen as a continuous area, regardless how it may have been implemented in the hardware.

Not with void* pointers though, that is non-standard compiler extension. As mentioned on paragraph from the standard, it applies only to character type pointers.

Edit:

As mafso pointed out in comments, above is only true as long as type of substraction result ptrdiff_t, has enough range for the result. Since range of size_t can be larger than ptrdiff_t, and if structure is big enough, it's possible that addresses are too far apart.

Because of this it's preferable to use offsetof macro on structure members and calculate result from those.

like image 31
user694733 Avatar answered Sep 25 '22 11:09

user694733


I believe the answer to this question is simpler than it appears, the OP asks:

but why should that result be "undefined"?

Well, let's see that the definition of undefined behavior is in the draft C99 standard section 3.4.3:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

it is simply behavior for which the standard does not impose a requirement, which perfectly fits this situation, the results are going to vary depending on the architecture and attempting to specify the results would have probably been difficult if not impossible in a portable manner. This leaves the question, why would they choose undefined behavior as opposed to let's say implementation of unspecified behavior?

Most likely it was made undefined behavior to limit the number of ways an invalid pointer could be created, this is consistent with the fact that we are provided with offsetof to remove the one potential need for pointer subtraction of unrelated objects.

Although the standard does not really define the term invalid pointer, we get a good description in Rationale for International Standard—Programming Languages—C which in section 6.3.2.3 Pointers says (emphasis mine):

Implicit in the Standard is the notion of invalid pointers. In discussing pointers, the Standard typically refers to “a pointer to an object” or “a pointer to a function” or “a null pointer.” A special case in address arithmetic allows for a pointer to just past the end of an array. Any other pointer is invalid.

The C99 rationale further adds:

Regardless how an invalid pointer is created, any use of it yields undefined behavior. Even assignment, comparison with a null pointer constant, or comparison with itself, might on some systems result in an exception.

This strongly suggests to us that a pointer to padding would be an invalid pointer, although it is difficult to prove that padding is not an object, the definition of object says:

region of data storage in the execution environment, the contents of which can represent values

and notes:

When referenced, an object may be interpreted as having a particular type; see 6.3.2.1.

I don't see how we can reason about the type or the value of padding between elements of a struct and therefore they are not objects or at least is strongly indicates padding is not meant to be considered an object.

like image 33
Shafik Yaghmour Avatar answered Sep 24 '22 11:09

Shafik Yaghmour