Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly does the C Structure Dot Operator Do (Lower Level Perspective)?

Tags:

c

struct

I have a question regarding structs in C. So when you create a struct, you are essentially defining the framework of a block of memory. Thus when you create an instance of a struct, you are creating a block of memory such that it is capable of holding a certain number of elements.

However, I'm somewhat confused on what the dot operator is doing. If I have a struct Car and have a member called GasMileage (which is an int member), I am able to get the value of GasMileage by doing something like,

int x = CarInstance.GasMileage;

However, I'm confused as to what is actually happening with this dot operator. Does the dot operator simply act as an offset from the base address? And how exactly is it able to deduce that it is an int?

I guess I'm curious as to what is going on behind the scenes. Would it be possible to reference GasMileage by doing something else? Such as

int *GasMileagePointer = (&carInstance + offsetInBytes(GasMileage));
int x = *GasMileage

This is just something i quickly made up. I've tried hard searching for an good explanation, but nothing seems to explain it any further than treating the dot operator as magic.

like image 813
Izzo Avatar asked Feb 23 '16 21:02

Izzo


3 Answers

Yes, the dot operator simply applies an offset from the base of the structure, and then accesses the value at that address.

int x = CarInstance.GasMileage;

is equivalent to:

int x = *(int *)((char*)&CarInstance + offsetof(Car, GasMileage));

For a member with some other type T, the only difference is that the cast (int *) becomes (T *).

like image 52
Barmar Avatar answered Oct 03 '22 11:10

Barmar


When it works, the "." behavior of the "." operator is equivalent to taking the address of the structure, indexing it by the offset of the member, and converting that to a pointer of the member type, and dereferencing it. The Standard, however, provides that there are situations where that isn't guaranteed to work. For example, given:

struct s1 {int x,y; }
struct s2 {int x,y; }
void test1(struct s1 *p1, struct s2 *p2)
{
  s1->x++;
  s2->x^=1;
  s1->x--;
  s2->x^=1;
}

a compiler may decide that there's no legitimate way that p1->x and p2->x can identify the same object, so it may reorder the code so as to the ++ and -- operations on s1->x cancel, and the ^=1 operations on s2->x cancel, thus leaving a function that does nothing.

Note that the behavior is different when using unions, since given:

union u { struct s1 v1; struct s2 v2; };

void test2(union u *uv)
{
  u->v1.x^=1;
  u->v2.x++;
  u->v1.x^=1;
  u->v2.x--;
}

the common-initial-subsequence rule indicates that since u->v1 and u->v2 start with fields of the same types, an access to such a field in u->v1 is equivalent to an access to the corresponding field in u->v2. Thus, a compiler is not allowed to resequence things. On the other hand, given

void test1(struct s1 *p1, struct s2 *p2);
void test3(union u *uv)
{
  test1(&(u.v1), &(u.v2));
}

the fact that u.v1 and u.v2 start with matching fields doesn't guard against a compiler's assumption that the pointers won't alias.

Note that some compilers offer an option to force generation of code where member accesses always behave equivalent to the aforementioned pointer operations. For gcc, the option is -fno-strict-alias. If code will need to access common initial members of varying structure types, omitting that switch may cause one's code to fail in weird, bizarre, and unpredictable ways.

like image 22
supercat Avatar answered Oct 03 '22 11:10

supercat


When you use the . operator, the compiler translates this to an offset inside the struct, based on the size of the fields (and padding) that precede it.

For example:

struct Car {
    char model[52];
    int doors;
    int GasMilage;
};

Assuming an int is 4 bytes and no padding, the offset of model is 0, the offset of doors is 52, and the offset of GasMilage is 56.

So if you know the offset of the member, you could get a pointer to it like this:

int *GasMileagePointer = (int*)((char *)&carInstance + offsetInBytes(GasMile));

The cast to char * is necessary so that pointer arithmetic goes 1 byte at a time instead of 1 sizeof(carInstance) at a time. Then the result needs to be casted to the correct pointer type, in this case int *

like image 40
dbush Avatar answered Oct 03 '22 12:10

dbush