Suppose I have a struct like this: <pre class="prettyprint"><code>struct MyStruct { uint8_t var0; uint32_t var1; uint8_t var2; uint8_t var3; uint8_t var4; }; </code></pre> This is possibly going to waste a bunch (well not a ton) of space. This is because of necessary alignment of the <code>uint32_t</code> variable. In actuality (after aligning the structure so that it can actually use the <code>uint32_t</code> variable) it might look something like this: <pre class="prettyprint"><code>struct MyStruct { uint8_t var0; uint8_t unused[3]; //3 bytes of wasted space uint32_t var1; uint8_t var2; uint8_t var3; uint8_t var4; }; </code></pre> A more efficient struct would be: <pre class="prettyprint"><code>struct MyStruct { uint8_t var0; uint8_t var2; uint8_t var3; uint8_t var4; uint32_t var1; }; </code></pre> Now, the question is: Why is the compiler forbidden (by the standard) from reordering the struct? I don't see any way you could shoot your self in the foot if the struct was reordered.

<blockquote> Why is the compiler forbidden (by the standard) from reordering the struct? </blockquote> The basic reason is: for compatibility with C. Remember that C is, originally, a high-level assembly language. It is quite common in C to view memory (network packets, ...) by reinterpreting the bytes as a specific <code>struct</code>. This has led to multiple features relying on this property: <ul> <li>C guaranteed that the address of a <code>struct</code> and the address of its first data member are one and the same, so C++ does too (in the absence of <code>virtual</code> inheritance/methods).</li> <li>C guaranteed that if you have two <code>struct</code> <code>A</code> and <code>B</code> and both start with a data member <code>char</code> followed by a data member <code>int</code> (and whatever after), then when you put them in a <code>union</code> you can write the <code>B</code> member and read the <code>char</code> and <code>int</code> through its <code>A</code> member, so C++ does too: Standard Layout.</li> </ul> The latter is extremely broad, and completely prevents any re-ordering of data members for most <code>struct</code> (or <code>class</code>). <hr> Note that the Standard does allow some re-ordering: since C did not have the concept of access control, C++ specifies that the relative order of two data members with a different access control specifier is unspecified. As far as I know, no compiler attempts to take advantage of it; but they could in theory. Outside of C++, languages such as Rust allow compilers to re-order fields and the main Rust compiler (rustc) does so by default. Only historical decisions and a strong desire for backward compatibility prevent C++ from doing so.

<blockquote> I don't see any way you could shoot your self in the foot, if the struct was reordered. </blockquote> Really? If this were permitted, communication between libraries/modules even in the same process would be ludicrously dangerous by default. <h3>"In universe" argument</h3> We must be able to know that our structs are defined the way that we've asked them to be. It's bad enough that padding is unspecified! Fortunately, you can control this when you need to. Okay, theoretically, a new language could be made such that, similarly, members were re-orderable unless some attribute were given. After all, we're not supposed to do memory-level magic on objects so if one were to use only C++ idioms, you'd be safe by default. But that's not the practical reality in which we live. <hr> <h3>"Out of universe" argument</h3> You could make things safe if, in your words, "the same reorder was used every time". The language would have to state unambiguously how members would be ordered. That's complicated to write in the standard, complicated to understand, and complicated to implement. It's much easier to just guarantee that the order will be as it is in code, and leave these decisions to the programmer. Remember, these rules have origin in old C, and old C gives power to the programmer. You've already shown in your question how easy it is to make the struct padding-efficient with a trivial code change. There's no need for any added complexity at the language level to do this for you.

The standard guarantees an allocation order simply because structs may represent a certain memory layout, such as a data protocol or a collection of hardware registers. For example, neither the programmer nor the compiler is free to re-arrange the order of the bytes in the TPC/IP protocol, or the hardware registers of a microcontroller. If the order was not guaranteed, <code>structs</code> would be mere, abstract data containers (similar to C++ vector), of which we can't assume much, except that they somehow contain the data we put inside them. It would make them substantially more useless when doing any form of low-level programming.

Struct Reordering by compiler [duplicate]

Tags:

c++

c

struct

memory-alignment

Suppose I have a struct like this:

struct MyStruct
{
  uint8_t var0;
  uint32_t var1;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
};

This is possibly going to waste a bunch (well not a ton) of space. This is because of necessary alignment of the uint32_t variable.

In actuality (after aligning the structure so that it can actually use the uint32_t variable) it might look something like this:

struct MyStruct
{
  uint8_t var0;
  uint8_t unused[3];  //3 bytes of wasted space
  uint32_t var1;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
};

A more efficient struct would be:

struct MyStruct
{
  uint8_t var0;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
  uint32_t var1;
};

Now, the question is:

Why is the compiler forbidden (by the standard) from reordering the struct?

I don't see any way you could shoot your self in the foot if the struct was reordered.

510

asked Jul 07 '16 11:07

DarthRubik

4 Answers

Why is the compiler forbidden (by the standard) from reordering the struct?

The basic reason is: for compatibility with C.

Remember that C is, originally, a high-level assembly language. It is quite common in C to view memory (network packets, ...) by reinterpreting the bytes as a specific struct.

This has led to multiple features relying on this property:

C guaranteed that the address of a struct and the address of its first data member are one and the same, so C++ does too (in the absence of virtual inheritance/methods).
C guaranteed that if you have two struct A and B and both start with a data member char followed by a data member int (and whatever after), then when you put them in a union you can write the B member and read the char and int through its A member, so C++ does too: Standard Layout.

The latter is extremely broad, and completely prevents any re-ordering of data members for most struct (or class).

Note that the Standard does allow some re-ordering: since C did not have the concept of access control, C++ specifies that the relative order of two data members with a different access control specifier is unspecified.

As far as I know, no compiler attempts to take advantage of it; but they could in theory.

Outside of C++, languages such as Rust allow compilers to re-order fields and the main Rust compiler (rustc) does so by default. Only historical decisions and a strong desire for backward compatibility prevent C++ from doing so.

158

answered Oct 14 '22 15:10

Matthieu M.

I don't see any way you could shoot your self in the foot, if the struct was reordered.

Really? If this were permitted, communication between libraries/modules even in the same process would be ludicrously dangerous by default.

"In universe" argument

We must be able to know that our structs are defined the way that we've asked them to be. It's bad enough that padding is unspecified! Fortunately, you can control this when you need to.

Okay, theoretically, a new language could be made such that, similarly, members were re-orderable unless some attribute were given. After all, we're not supposed to do memory-level magic on objects so if one were to use only C++ idioms, you'd be safe by default.

But that's not the practical reality in which we live.

"Out of universe" argument

You could make things safe if, in your words, "the same reorder was used every time". The language would have to state unambiguously how members would be ordered. That's complicated to write in the standard, complicated to understand, and complicated to implement.

It's much easier to just guarantee that the order will be as it is in code, and leave these decisions to the programmer. Remember, these rules have origin in old C, and old C gives power to the programmer.

You've already shown in your question how easy it is to make the struct padding-efficient with a trivial code change. There's no need for any added complexity at the language level to do this for you.

answered Oct 14 '22 15:10

Lightness Races in Orbit

The standard guarantees an allocation order simply because structs may represent a certain memory layout, such as a data protocol or a collection of hardware registers. For example, neither the programmer nor the compiler is free to re-arrange the order of the bytes in the TPC/IP protocol, or the hardware registers of a microcontroller.

If the order was not guaranteed, structs would be mere, abstract data containers (similar to C++ vector), of which we can't assume much, except that they somehow contain the data we put inside them. It would make them substantially more useless when doing any form of low-level programming.

answered Oct 14 '22 15:10

Lundin

The compiler should keep the order of its members in the case the structures are read by any other low-level code produced by another compiler or another language. Say you were creating an operating system, and you decide to write part of it in C, and part of it in assembly. You could define the following structure:

struct keyboard_input
{
    uint8_t modifiers;
    uint32_t scancode;
}

You pass this to an assembly routine, where you need to manually specify the memory layout of the structure. You would expect to be able to write the following code on a system with 4-byte alignment.

; The memory location of the structure is located in ebx in this example
mov al, [ebx]
mov edx, [ebx+4]

Now say the compiler would change the order of the members in the structure in an implementation defined way, this would mean that depending on the compiler you use and the flags you pass to it, you could either end up with the first byte of the scancode member in al, or with the modifiers member.

Of course the problem is not just reduced to low-level interfaces with assembly routines, but would also appear if libraries built with different compilers would call each other (e.g. building a program with mingw using the windows API).

Because of this, the language just forces you to think about the structure layout.

answered Oct 14 '22 15:10

Shadowwolf

Related questions
                            
                                Measure execution time in C++ OpenMP code
                            
                                c++, usleep() is obsolete, workarounds for Windows/MingW?
                            
                                Get the points of intersection from 2 rectangles
                            
                                How to use the __attribute__((visibility("default")))?
                            
                                C++ exception overhead
                            
                                Dealing with Angle Wrap in c++ code
                            
                                in c++ main function is the entry point to program how i can change it to an other function?
                            
                                gcc optimization flags for Xeon?
                            
                                delete[] an array of objects
                            
                                Correct usage of strtol
                            
                                C++ Interface vs Template
                            
                                Shift image content with OpenCV
                            
                                How to test the current version of GCC at compile time?
                            
                                What good are public variables then?
                            
                                How to write vector values to a file
                            
                                static and extern global variables in C and C++
                            
                                Should I unify two similar kernels with an 'if' statement, risking performance loss?
                            
                                How to write a type trait `is_container` or `is_vector`?
                            
                                Comparison tricks in C++
                            
                                Why isn't 'nullptr' in the 'std' namespace?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Struct Reordering by compiler [duplicate]

Tags:

c++

c

struct

memory-alignment

DarthRubik

People also ask

4 Answers

Matthieu M.

"In universe" argument

"Out of universe" argument

Lightness Races in Orbit

Lundin

Shadowwolf

Recent Activity

Donate For Us