Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any guarantees about C struct order?

Tags:

c

pointers

struct

I've used structs extensively and I've seen some interesting things, especially *value instead of value->first_value where value is a pointer to struct, first_value is the very first member, is *value safe?

Also note that sizes aren't guaranteed because of alignment, whats the alginment value based on, the architecture/register size?

We align data/code for faster execution can we tell compiler not to do this? so maybe we can guarantee certain things about structs, like their size?

When doing pointer arithmetic on struct members in order to locate member offset, I take it you do - if little endian + for big endian, or does it just depend on the compiler?

what does malloc(0) really allocate?

The following code is for educational/discovery purposes, its not meant to be of production quality.

#include <stdlib.h>
#include <stdio.h>

int main()
{
    printf("sizeof(struct {}) == %lu;\n", sizeof(struct {}));
    printf("sizeof(struct {int a}) == %lu;\n", sizeof(struct {int a;}));
    printf("sizeof(struct {int a; double b;}) == %lu;\n", sizeof(struct {int a; double b;}));
    printf("sizeof(struct {char c; double a; double b;}) == %lu;\n", sizeof(struct {char c; double a; double b;}));

    printf("malloc(0)) returns %p\n", malloc(0));
    printf("malloc(sizeof(struct {})) returns %p\n", malloc(sizeof(struct {})));

    struct {int a; double b;} *test = malloc(sizeof(struct {int a; double b;}));
    test->a = 10;
    test->b = 12.2;
    printf("test->a == %i, *test == %i \n", test->a, *(int *)test);
    printf("test->b == %f, offset of b is %i, *(test - offset_of_b) == %f\n",
        test->b, (int)((void *)test - (void *)&test->b),
        *(double *)((void *)test - ((void *)test - (void *)&test->b))); // find the offset of b, add it to the base,$

    free(test);
    return 0;
}

calling gcc test.c followed by ./a.out I get this:

sizeof(struct {}) == 0;
sizeof(struct {int a}) == 4;
sizeof(struct {int a; double b;}) == 16;
sizeof(struct {char c; double a; double b;}) == 24;
malloc(0)) returns 0x100100080
malloc(sizeof(struct {})) returns 0x100100090
test->a == 10, *test == 10 
test->b == 12.200000, offset of b is -8, *(test - offset_of_b) == 12.200000

Update this is my machine:

gcc --version

i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

uname -a

Darwin MacBookPro 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386
like image 563
Samy Vilar Avatar asked Dec 06 '22 13:12

Samy Vilar


2 Answers

From 6.2.5/20:

A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type.

To answer:

especially *value instead of value->first_value where value is a pointer to struct, first_value is the very first member, is *value safe?

see 6.7.2.1/15:

15 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.1

There may however be padding bytes at the end of the structure as also in-between members.

In C, malloc( 0 ) is implementation defined. (As a side note, this is one of those little things where C and C++ differ.)

[1] Emphasis mine.

like image 198
dirkgently Avatar answered Dec 23 '22 23:12

dirkgently


I've used structs extensively and I've seen some interesting things, especially *value instead of value->first_value where value is a pointer to struct, first_value is the very first member, is *value safe?

Yes, *value is safe; it yields a copy of the structure that value points at. But it is almost guaranteed to have a different type from *value->first_value, so the result of *value will almost always be different from *value->first_value.


Counter-example:

struct something { struct something *first_value; ... };
struct something data = { ... };
struct something *value = &data;
value->first_value = value;

Under this rather limited set of circumstances, you would get the same result from *value and *value->first_value. Under that scheme, the types would be the same (even if the values are not). In the general case, the type of *value and *value->first_value are of different types.


Also note that sizes aren't guaranteed because of alignment, but is alignment always on register size?

Since 'register size' is not a defined C concept, it isn't clear what you're asking. In the absence of pragmas (#pragma pack or similar), the elements of a structure will be aligned for optimal performance when the value is read (or written).

We align data/code for faster execution; can we tell compiler not to do this? So maybe we can guarantee certain things about structs, like their size?

The compiler is in charge of the size and layout of struct types. You can influence by careful design and perhaps by #pragma pack or similar directives.

These questions normally arise when people are concerned about serializing data (or, rather, trying to avoid having to serialize data by processing structure elements one at a time). Generally, I think you're better off writing a function to do the serialization, building it up from component pieces.

When doing pointer arithmetic on struct members in order to locate member offset, I take it you do subtraction if little endian, addition for big endian, or does it just depend on the compiler?

You're probably best off not doing pointer arithmetic on struct members. If you must, use the offsetof() macro from <stddef.h> to handle the offsets correctly (and that means you're not doing pointer arithmetic directly). The first structure element is always at the lowest address, regardless of big-endianness or little-endianness. Indeed, endianness has no bearing on the layout of different members within a structure; it only has an affect on the byte order of values within a (basic data type) member of a structure.

The C standard requires that the elements of a structure are laid out in the order that they are defined; the first element is at the lowest address, and the next at a higher address, and so on for each element. The compiler is not allowed to change the order. There can be no padding before the first element of the structure. There can be padding after any element of the structure as the compiler sees fit to ensure what it considers appropriate alignment. The size of a structure is such that you can allocate (N × size) bytes that are appropriately aligned (e.g. via malloc()) and treat the result as an array of the structure.

like image 26
Jonathan Leffler Avatar answered Dec 23 '22 23:12

Jonathan Leffler