Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is size_of::<MyStruct>() not equal to the sum of the sizes of its fields?

I tried to measure the size of a struct and its fields (Playground):

use std::mem;

struct MyStruct {
    foo: u8,
    bar: char,
}

println!("MyStruct: {}", mem::size_of::<MyStruct>());

let obj = MyStruct { foo: 0, bar: '0' };
println!("obj:      {}", mem::size_of_val(&obj));
println!("obj.foo:  {}", mem::size_of_val(&obj.foo));
println!("obj.bar:  {}", mem::size_of_val(&obj.bar));

This program prints:

MyStruct: 8
obj:      8
obj.foo:  1
obj.bar:  4

So the size of the struct is bigger than the sum of its field's sizes (which would be 5). Why is that?

like image 700
Lukas Kalbertodt Avatar asked Apr 28 '17 09:04

Lukas Kalbertodt


People also ask

How is the size of a struct related to the size of all the elements of the struct?

The size of a structure is greater than the sum of its parts because of what is called packing. A particular processor has a preferred data size that it works with.

Why some structure variables occupy more size than expected?

It is because of memory alignment. By default memory is not aligned on one bye order and this happens. Memory is allocated on 4-byte chunks on 32bit systems.

What does sizeof return for struct?

When sizeof() is used with the data types, it simply returns the amount of memory allocated to that data type. The output can be different on different machines like a 32-bit system can show different output while a 64-bit system can show different of same data types.


1 Answers

The difference is due to padding in order to satisfy a types alignment requirements. Values of specific types don't want to live at arbitrary addresses, but only at addresses divisible by the types' alignment. For example, take char: it has an alignment of 4 so it only wants to live at addresses divisible by 4, like 0x4, 0x8 or 0x7ffd463761bc, and not at addresses like 0x6 or 0x7ffd463761bd.

The alignment of a type is platform dependent, but it's usually true that types of size 1, 2 or 4 have an alignment of 1, 2 and 4 respectively, too. An alignment of 1 means that a value of that type feels comfortable at any address (since any address is divisible by 1).

So what about your struct now? In Rust,

composite structures will have an alignment equal to the maximum of their fields' alignment.

This means that the alignment of your MyStruct type is also 4. We can check that with mem::align_of() and mem::align_of_val():

// prints "4"
println!("{}", mem::align_of::<MyStruct>());

Now suppose a value of your struct lives at 0x4 (which satisfies the struct's direct alignment requirements):

0x4:   [obj.foo]
0x5:   [obj.bar's first byte]
0x6:   [obj.bar's second byte]
0x7:   [obj.bar's third byte]
0x8:   [obj.bar's fourth byte]

Oops, obj.bar now lives at 0x5, although its alignment is 4! That's bad!

To fix this, the Rust compiler inserts so called padding -- unused bytes -- into the struct. In memory it now looks like this:

0x4:   [obj.foo]
0x5:   padding (unused)
0x6:   padding (unused)
0x7:   padding (unused)
0x8:   [obj.bar's first byte]
0x9:   [obj.bar's second byte]
0xA:   [obj.bar's third byte]
0xB:   [obj.bar's fourth byte]

For this reason, the size of MyStruct is 8, because the compiler added 3 padding bytes. Now everything is fine again!

... except maybe the wasted space? Indeed, this is unfortunate. A solution would be to swap the struct's fields. Fortunately for this purpose, the memory layout of a struct in Rust is unspecified, unlike in C or C++. In particular, the Rust compiler is allowed to change the order of fields. You cannot assume that obj.foo has a lower address than obj.bar!

And since Rust 1.18, this optimization is performed by the compiler.


But even with a Rust compiler newer or equal to 1.18, your struct is still 8 bytes in size. Why?

There is another rule for memory layout: a struct's size must always be a multiple of its alignment. This is useful to be able to densely layout those structs in an array. Suppose the compiler will reorder our struct fields and the memory layout looks like this:

0x4:   [obj.bar's first byte]
0x5:   [obj.bar's second byte]
0x6:   [obj.bar's third byte]
0x7:   [obj.bar's fourth byte]
0x8:   [obj.foo]

Looks like 5 bytes, right? Nope! Imagine having an array [MyStruct]. In an array all elements are next to each other in the memory:

0x4:   [[0].bar's first byte]
0x5:   [[0].bar's second byte]
0x6:   [[0].bar's third byte]
0x7:   [[0].bar's fourth byte]
0x8:   [[0].foo]
0x9:   [[1].bar's first byte]
0xA:   [[1].bar's second byte]
0xB:   [[1].bar's third byte]
0xC:   [[1].bar's fourth byte]
0xD:   [[1].foo]
0xE:   ...

Oops, now the array's second element's bar starts at 0x9! So in fact, the arrays size needs to be a multiple of its alignment. Thus, our memory looks like this:

0x4:   [[0].bar's first byte]
0x5:   [[0].bar's second byte]
0x6:   [[0].bar's third byte]
0x7:   [[0].bar's fourth byte]
0x8:   [[0].foo]
0x9:   [[0]'s padding byte]
0xA:   [[0]'s padding byte]
0xB:   [[0]'s padding byte]
0xC:   [[1].bar's first byte]
0xD:   [[1].bar's second byte]
0xE:   [[1].bar's third byte]
0xF:   [[1].bar's fourth byte]
0x10:  [[1].foo]
0x11:  [[1]'s padding byte]
0x12:  [[1]'s padding byte]
0x13:  [[1]'s padding byte]
0x14:  ...

Related:

  • Chapter about memory layout in the Rustonomicon
  • Similar question on the C++ tag
like image 55
Lukas Kalbertodt Avatar answered Sep 28 '22 18:09

Lukas Kalbertodt