Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the size of a boolean In C#? Does it really take 4-bytes?

Tags:

c#

interop

I have two structs with arrays of bytes and booleans:

using System.Runtime.InteropServices;

[StructLayout(LayoutKind.Sequential, Pack = 4)]
struct struct1
{
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
    public byte[] values;
}

[StructLayout(LayoutKind.Sequential, Pack = 4)]
struct struct2
{
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
    public bool[] values;
}

And the following code:

class main
{
    public static void Main()
    {
        Console.WriteLine("sizeof array of bytes: "+Marshal.SizeOf(typeof(struct1)));
        Console.WriteLine("sizeof array of bools: " + Marshal.SizeOf(typeof(struct2)));
        Console.ReadKey();
    }
}

That gives me the following output:

sizeof array of bytes: 3
sizeof array of bools: 12

It seems to be that a boolean takes 4 bytes of storage. Ideally a boolean would only take one bit (false or true, 0 or 1, etc..).

What is happening here? Is the boolean type really so inefficient?

like image 744
biv Avatar asked Oct 15 '22 04:10

biv


People also ask

What is the size of boolean datatype?

Boolean variables are stored as 16-bit (2-byte) numbers, but they can only be True or False.

Is a boolean 1 bit or 1 byte?

It's a question many may ask when they discover that in C or C++ a Boolean variable or 'bool' for short, is simply an alias for a 4-byte unsigned integer (actually 1-byte oops). Shock horror! Any self-respecting novice in the field might ask “but we can fit eight booleans in just one byte!

Why is a boolean 8 bit?

A boolean type normally follows the smallest unit of addressable memory of the target machine (i.e. usually the 8bits byte). Access to memory is always in "chunks" (multiple of words, this is for efficiency at the hardware level, bus transactions): a boolean bit cannot be addressed "alone" in most CPU systems.

Are booleans 1 bit?

A bool can be 1 byte or more, depending on the implementation. See fundamental types. it's 1byte (8 bits), Use bitfields or manually write code to access bits in memory buffer.


1 Answers

The bool type has a checkered history with many incompatible choices between language runtimes. This started with an historical design-choice made by Dennis Ritchie, the guy that invented the C language. It did not have a bool type, the alternative was int where a value of 0 represents false and any other value was considered true.

This choice was carried forward in the Winapi, the primary reason to use pinvoke, it has a typedef for BOOL which is an alias for the C compiler's int keyword. If you don't apply an explicit [MarshalAs] attribute then a C# bool is converted to a BOOL, thus producing a field that is 4 bytes long.

Whatever you do, your struct declaration needs to be a match with the runtime choice made in the language you interop with. As noted, BOOL for the winapi but most C++ implementations chose byte, most COM Automation interop uses VARIANT_BOOL which is a short.

The actual size of a C# bool is one byte. A strong design-goal of the CLR is that you cannot find out. Layout is an implementation detail that depends on the processor too much. Processors are very picky about variable types and alignment, wrong choices can significantly affect performance and cause runtime errors. By making the layout undiscoverable, .NET can provide a universal type system that does not depend on the actual runtime implementation.

In other words, you always have to marshal a structure at runtime to nail down the layout. At which time the conversion from the internal layout to the interop layout is made. That can be very fast if the layout is identical, slow when fields need to be re-arranged since that always requires creating a copy of the struct. The technical term for this is blittable, passing a blittable struct to native code is fast because the pinvoke marshaller can simply pass a pointer.

Performance is also the core reason why a bool is not a single bit. There are few processors that make a bit directly addressable, the smallest unit is a byte. An extra instruction is required to fish the bit out of the byte, that doesn't come for free. And it is never atomic.

The C# compiler isn't otherwise shy about telling you that it takes 1 byte, use sizeof(bool). This is still not a fantastic predictor for how many bytes a field takes at runtime, the CLR also needs to implement the .NET memory model and it promises that simple variable updates are atomic. That requires variables to be properly aligned in memory so the processor can update it with a single memory-bus cycle. Pretty often, a bool actually requires 4 or 8 bytes in memory because of this. Extra padding that was added to ensure that the next member is aligned properly.

The CLR actually takes advantage of layout being undiscoverable, it can optimize the layout of a class and re-arrange the fields so the padding is minimized. So, say, if you have a class with a bool + int + bool member then it would take 1 + (3) + 4 + 1 + (3) bytes of memory, (3) is the padding, for a total of 12 bytes. 50% waste. Automatic layout rearranges to 1 + 1 + (2) + 4 = 8 bytes. Only a class has automatic layout, structs have sequential layout by default.

More bleakly, a bool can require as many as 32 bytes in a C++ program compiled with a modern C++ compiler that supports the AVX instruction set. Which imposes a 32-byte alignment requirement, the bool variable may end up with 31 bytes of padding. Also the core reason why a .NET jitter does not emit SIMD instructions, unless explicitly wrapped, it can't get the alignment guarantee.

like image 72
Hans Passant Avatar answered Oct 19 '22 01:10

Hans Passant