Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Struct Reordering by compiler [duplicate]

Suppose I have a struct like this:

struct MyStruct
{
  uint8_t var0;
  uint32_t var1;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
};

This is possibly going to waste a bunch (well not a ton) of space. This is because of necessary alignment of the uint32_t variable.

In actuality (after aligning the structure so that it can actually use the uint32_t variable) it might look something like this:

struct MyStruct
{
  uint8_t var0;
  uint8_t unused[3];  //3 bytes of wasted space
  uint32_t var1;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
};

A more efficient struct would be:

struct MyStruct
{
  uint8_t var0;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
  uint32_t var1;
};

Now, the question is:

Why is the compiler forbidden (by the standard) from reordering the struct?

I don't see any way you could shoot your self in the foot if the struct was reordered.

like image 510
DarthRubik Avatar asked Jul 07 '16 11:07

DarthRubik


People also ask

Can compiler reorder struct members?

Since the rules are fixed in the language, the compiler is able to figure out how the members were reordered, and react accordingly. As mentioned above, it will always be possible to prevent reordering in the cases where you want complete control.

Does order matter in struct?

The order of fields in a struct does matter - the compiler is not allowed to reorder fields, so the size of the struct may change as the result of adding some padding.

Can we simply rearrange the members of the structure to reduce padding?

Rearranging members to reduce paddingYou can reduce the size of each widget by rearranging the members to reduce the number of padding bytes.

Why might a compiler reorder the fields of a record what problems might this cause?

Why might a compiler reorder fields of a record? To remove difficulty in comparison implementation.


4 Answers

Why is the compiler forbidden (by the standard) from reordering the struct?

The basic reason is: for compatibility with C.

Remember that C is, originally, a high-level assembly language. It is quite common in C to view memory (network packets, ...) by reinterpreting the bytes as a specific struct.

This has led to multiple features relying on this property:

  • C guaranteed that the address of a struct and the address of its first data member are one and the same, so C++ does too (in the absence of virtual inheritance/methods).

  • C guaranteed that if you have two struct A and B and both start with a data member char followed by a data member int (and whatever after), then when you put them in a union you can write the B member and read the char and int through its A member, so C++ does too: Standard Layout.

The latter is extremely broad, and completely prevents any re-ordering of data members for most struct (or class).


Note that the Standard does allow some re-ordering: since C did not have the concept of access control, C++ specifies that the relative order of two data members with a different access control specifier is unspecified.

As far as I know, no compiler attempts to take advantage of it; but they could in theory.

Outside of C++, languages such as Rust allow compilers to re-order fields and the main Rust compiler (rustc) does so by default. Only historical decisions and a strong desire for backward compatibility prevent C++ from doing so.

like image 158
Matthieu M. Avatar answered Oct 14 '22 15:10

Matthieu M.


I don't see any way you could shoot your self in the foot, if the struct was reordered.

Really? If this were permitted, communication between libraries/modules even in the same process would be ludicrously dangerous by default.

"In universe" argument

We must be able to know that our structs are defined the way that we've asked them to be. It's bad enough that padding is unspecified! Fortunately, you can control this when you need to.

Okay, theoretically, a new language could be made such that, similarly, members were re-orderable unless some attribute were given. After all, we're not supposed to do memory-level magic on objects so if one were to use only C++ idioms, you'd be safe by default.

But that's not the practical reality in which we live.


"Out of universe" argument

You could make things safe if, in your words, "the same reorder was used every time". The language would have to state unambiguously how members would be ordered. That's complicated to write in the standard, complicated to understand, and complicated to implement.

It's much easier to just guarantee that the order will be as it is in code, and leave these decisions to the programmer. Remember, these rules have origin in old C, and old C gives power to the programmer.

You've already shown in your question how easy it is to make the struct padding-efficient with a trivial code change. There's no need for any added complexity at the language level to do this for you.

like image 42
Lightness Races in Orbit Avatar answered Oct 14 '22 15:10

Lightness Races in Orbit


The standard guarantees an allocation order simply because structs may represent a certain memory layout, such as a data protocol or a collection of hardware registers. For example, neither the programmer nor the compiler is free to re-arrange the order of the bytes in the TPC/IP protocol, or the hardware registers of a microcontroller.

If the order was not guaranteed, structs would be mere, abstract data containers (similar to C++ vector), of which we can't assume much, except that they somehow contain the data we put inside them. It would make them substantially more useless when doing any form of low-level programming.

like image 21
Lundin Avatar answered Oct 14 '22 15:10

Lundin


The compiler should keep the order of its members in the case the structures are read by any other low-level code produced by another compiler or another language. Say you were creating an operating system, and you decide to write part of it in C, and part of it in assembly. You could define the following structure:

struct keyboard_input
{
    uint8_t modifiers;
    uint32_t scancode;
}

You pass this to an assembly routine, where you need to manually specify the memory layout of the structure. You would expect to be able to write the following code on a system with 4-byte alignment.

; The memory location of the structure is located in ebx in this example
mov al, [ebx]
mov edx, [ebx+4]

Now say the compiler would change the order of the members in the structure in an implementation defined way, this would mean that depending on the compiler you use and the flags you pass to it, you could either end up with the first byte of the scancode member in al, or with the modifiers member.

Of course the problem is not just reduced to low-level interfaces with assembly routines, but would also appear if libraries built with different compilers would call each other (e.g. building a program with mingw using the windows API).

Because of this, the language just forces you to think about the structure layout.

like image 39
Shadowwolf Avatar answered Oct 14 '22 15:10

Shadowwolf