Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C: Data structures alignment

I'm working with structures and have several questions about them. As I understand structure variables will be placed at memory sequentially. Length of blocks(words) depends on machine architecture (32 bit - 4 byte, 64 bit - 8 bytes).

Lets say we have 2 data structures:

struct ST1 {
    char c1;
    short s;
    char c2;
    double d;
    int i;
};

In memory it will be:

32 bit - 20 bytes    
 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
------------------------------------------------------------------------------------------
 c1| PB| s | s | c1| PB| PB| PB| d | d | d  | d  | d  | d  | d  | d  | i  | i  | i  | i  |

64 bit - 24 bytes    | 20 | 21 | 22 | 23 |
previous sequence +  ---------------------
                     | PB | PB | PB | PB |

But we can rearrange it, to make this data fit into machine word. Like this:

struct ST2 {
    double d;
    int i;
    short s;
    char c1;
    char c2;
};

In this case for both 32 and 64 bit it will be represented at the same way (16 bytes):

 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
----------------------------------------------------------------------
 d | d | d | d | d | d | d | d | i | i | i  | i  | s  | s  | ch1| ch2|

I have a couple of questions:

  • It's like wild guess but main rule for struct is to define variables with bigger size at the beginning?
  • As I understand it's not working with stand-alone variables. Like char str[] = "Hello";?
  • Padding byte, what code it has? Is it somewhere at ASCII table? Sorry, couldn't find it.
  • 2 structures with all members represented at memory by different addresses and they can be placed not sequentially at memory?
  • Such structure: struct ST3 { char c1; char c2; char c3;} st3; Has size = 3, I understand that if we will add a member with other type into it, it will be aligned. But why it's not aligned before it?
like image 534
Viacheslav Kondratiuk Avatar asked Jun 07 '13 08:06

Viacheslav Kondratiuk


Video Answer


1 Answers

The basic rules are simple:

  • members must be there in order (unless in C++ you use private: public: ... sections)
  • padding is allowed between members and after the last

That's about it. The rest is left to implementation: the storage taken by types, the padding amount. Normally you can expect it to be properly documented in ABI or directly in the compiler, and even have tools for manipulation.

In practice padding is necessary on some architectures, say SPARC requires 32-bit "ints" aligned on address divisible by 4. On others it is not requirement but misaligned entities may take more time to process, say a 80286 processor takes an extra cycle to read 16-bit entity from an odd address. (Before I forget: representation of types itself is different!)

It is usual that alignment requirement or best performance matches exactly: you shall align on boundary same as size. A good counter-example is the 80-bit floating point numbers (available as double or long double in some compilers) that like 8 or 16 byte alignment rather than 10.

To fiddle with padding compiler usually give you a switch to set default. That changes from version to version, so better taken into count on upgrade. And inside code override facility like _attribute__(packed) in gcc and #pragma pack in MS and many others. Those are all extensions to standard obviously.

The bottom line is, if you want to fiddle with layout, you start reading the dox of all the compilers you target for, now and in the future, to know what they do and how to control it. Possibly also read dox of the target platforms, depending on why you're interested in layout in the first place.

One usual motivation is to have a stable layout as you write out raw memory to file and expect to read it back. Maybe on different platform using different compiler. That is the easier one until a new platform type enters the scene.

Other motivation is performance. That one is way more tricky, as rules change fast, and effect is hard to predict right away. Say on intel the basic "misaligned" penalty is gone for long time, instead what counts is to be inside a cache line. Where cache line size varies by processor. Also using more padding may produce better individual while fully packed structures are more economic in cache usage.

And some operations require proper alignment, but are not directly enforced by the compiler, you may need to apply special alignment pragmas (like for certain SSE-related stuff).

Bottom line repeated: stop guessing, decide your targets and read the proper dox. (Btw for me reading the architecture manuals for SPARC, IA32 and others was tremendous fun and gain in many respects.)

like image 153
Balog Pal Avatar answered Oct 21 '22 22:10

Balog Pal