Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rust seems to allocate the same space in memory for an array of booleans as an array of 8 bit integers

Tags:

rust

Running this code in rust:

fn main() {
    println!("{:?}", std::mem::size_of::<[u8; 1024]>());
    println!("{:?}", std::mem::size_of::<[bool; 1024]>());
}

1024

1024

This is not what I expected. So I compiled and ran in release mode. But I got the same answer.

Why does the rust compiler seemingly allocate a whole byte for each single boolean? To me it seems to be a simple optimization to only allocate 128 bytes instead. This project implies I'm not the first to think this.

Is this a case of compilers being way harder than the seem? Or is this not optimized because it isn't a realistic scenario? Or am I not understanding something here?

like image 656
andy boot Avatar asked Feb 19 '18 22:02

andy boot


People also ask

How much space does a Boolean take?

Internally, a Boolean variable is a 2-byte value holding –1 (for TRUE) or 0 (for FALSE). Any type of data can be assigned to Boolean variables. When assigning, non-0 values are converted to TRUE , and 0 values are converted to FALSE. When appearing as a structure member, Boolean members require 2 bytes of storage.

How many bytes is a boolean in Rust?

It says the size of bool in Rust is 1 byte, and use 0 or 1 to represent both false and true .

Why are Booleans stored in bytes?

It has to do with what the CPU can easily address. For example on an x86 processor there is an eax (32 bits), ax (16 bits) and a ah (8 bits) but no single bit register. So in order for it to use a single bit the CPU will have to do a read/modify/write to change the value.


1 Answers

Pointers and references.

  1. There is an assumption that you can always take a reference to an item of a slice, a field of a struct, etc...
  2. There is an assumption in the language that any reference to an instance of a statically sized type can transmuted to a type-erased pointer *mut ().

Those two assumptions together mean that:

  • due to (2), it is not possible to create a "bit-reference" which would allow sub-byte addressing,
  • due to (1), it is not possible not to have references.

This essentially means that any type must have a minimum alignment of one byte.


Note that this is not necessarily an issue. Opting in to a 128 bytes representation should be done cautiously, as it implies trading off speed (and convenience) for memory. It's not a pure win.

Prior art (in the name of std::vector<bool> in C++) is widely considered a mistake in hindsight.

like image 200
Matthieu M. Avatar answered Oct 17 '22 09:10

Matthieu M.