Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which data types to use? And how the CPU reads them?

Tags:

c#

Let's start small, say I need to store a const value of 200, should I always be using a unsigned byte for this?

This is just a minimal thing I guess. But what about structs? Is it wise to build up my structs so that it is dividable by 32 on a 32 bit system? Let's say I need to iterate over a very large array of structs, does it matter much if the struct consists of 34 bits or 64? I would think it gains a lot if I could squeeze off 2 bits from the 34 bit struct?

Or does all this make unnecessary overhead and am I better off replacing all my bits and shorts to ints inside this struct so the CPU does not have to "go looking" for the right memory block?

like image 209
Madmenyo Avatar asked Jan 12 '23 05:01

Madmenyo


2 Answers

This is a strong processor implementation detail, the CLR and the jitter already do a lot of work to ensure that your data types are optimal to get the best perf out of the program. There is for example never a case where a struct ever occupies 34 bits, the CLR design choices already ensure that you get a running start on using types that work well on modern processors.

Structs are laid-out to be optimal and that involves alignment choices that depend on the data type. An int for example will always be aligned to an offset that's a multiple of 4. Which gives the processor an easy time to read the int, it doesn't have to multiplex the misaligned bytes back into an int and avoids a scenario where the value straddles a cpu cache line and needs to be glued back together from multiple memory bus reads. Some processors event treat misaligned reads and writes as fatal errors, one of the reasons you don't have an Itanium in your machine.

So if you have a struct that has a byte and an int then you'll end up with a data type that takes 8 bytes which doesn't use 3 of the bytes, the ones between the byte and the int. These unused bytes are called padding. There can also be padding at the end of a struct to ensure that alignment is still optimal when you put them in an array.

Declaring a single variable as a byte is okay, Intel/AMD processors take the same amount of time to read/write one as a 32-bit int. But using short is not okay, that requires an extra byte in the cpu instruction (a size override prefix) and can cost an extra cpu cycle. In practice you don't often save any memory because of the alignment rule. Using byte only buys you something if it can be combined with another byte. An array of bytes is fine, a struct with multiple byte members is fine. Your example is not, it works just as well when you declare it int.

Using types smaller than an int can be awkward in C# code, the MSIL code model is int-based. Basic operators like + and - are only defined for int and larger, there is no operator for smaller types. So you end up having to use a cast to truncate the result back to a smaller size. The sweet spot is int.

like image 190
Hans Passant Avatar answered Jan 23 '23 15:01

Hans Passant


Wow, it really depends on a bunch of stuff. Are you concerned about performance or memory? If it's performance you are generally better off staying with the "natural" word size alignment. So for example if you are using a 64-bit processor using 64-bit ints, aligned on 64-bit boundaries provides the best performance. I don't think C# makes any guarantees about this type of thing (it's meant to remain abstract from the hardware).

That said there is a informal rule that says "Avoid the sin of premature optimization". This is particularly true in C#. If you aren't having a performance or memory issue, don't worry about it.

If you find you are having a performance problem, use a profiler to determine where the problem actually is (it might not be where you think). If it's a memory problem determine the objects consuming the most memory and determine where you can optimize (as per your example using a byte rather than an int or short, if possible).

If you really have to worry about such details you might want to consider using C++, where you can better control memory usage (for example you can allocate large blocks of memory without it being initialized), access bitfields, etc.

like image 39
Dweeberly Avatar answered Jan 23 '23 15:01

Dweeberly