Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance advantages of powers-of-2 sized data?

Tags:

If I have a game which has a 3D world, and the world is quite big, so needs to be split into chunks, is there a major, if any, performance advantage of having 128 byte chunks over, say 150 byte chunks? Obviously, the objects in the chunks are still a whole number of bytes in size.

i.e. Is chunks[128][128][128] faster than chunks[150][150][150] or chunks[112][112][112]? Are there any other side effects such as excessive RAM wastage afterwards? Are there any other factors that should be taken into consideration?

I just see that it's a convention to store everything in variables and arrays of sizes that are powers of 2, but I'm not sure whether there's any merit to it, and if it could be better to use more human numbers like 100 or 150.

like image 267
Greg Avatar asked Mar 01 '12 11:03

Greg


People also ask

What are the advantages of using powers of two in programming?

Operations using powers of two values can also be optimised - a multiply or divide becomes a simple bit shift. Basically ensuring everything uses powers of two mightimprove the performance of your software, but normally a compiler and/or OS will ensure that your data is utilised in an effective way when you use arbitrary sizes.

What is the advantage of keeping batch size a power of 2?

What is the advantage of keeping batch size a power of 2? While training models in machine learning, why is it sometimes advantageous to keep the batch size to a power of 2? I thought it would be best to use a size that is the largest fit in your GPU memory / RAM. This answer claims that for some packages, a power of 2 is better as a batch size.

Why is the page size always a power of 2?

As we see earlier that page size/frame size or address size is always a power of 2 because if page size or address size is power 2 then it is easier to break the address into two parts. If l is one portion of the address then we can calculate another portion as n - l.

What is the performance impact of not using powers of 2?

I just ran a quick experiment training yolov4-csp on coco with batch sizes 8 and 9 and found that per-image, batch sized 9 was slightly more efficient than 8. So at least with pytorch and relatively small batches on a modern GPU (2080Ti) it would seem that there is no negative performance impact of not using powers of 2 for batch sizes.


2 Answers

The other answers are indeed correct that power-of-two sized data will benefit from using shifts over multiplies.

However, there is a dark side to power-of-two size data. And it can hit you when you least expect it.

See these two question/answers:

  • Matrix multiplication: Small difference in matrix size, large difference in timings
  • Why are elementwise additions much faster in separate loops than in a combined loop?

When your datasets are powers-of-two, they are more likely to be super-aligned in memory. (meaning their addresses will likely have the same modulo over a large power-of-two.)

While this may seem desirable, they can lead to:

  • Conflict Cache Misses
  • False Aliasing Stalls (mentioned in the second link above)

If you read the two questions linked to above, you can see that alignment can cause a slow-down of more than 3x - which will likely far out-weigh any benefit you get from using shifts as opposed to multiplies.


So as with all performance questions, you need to measure, measure, measure... And be prepared to expect anything to happen.

You mention that you are representing a 3D-space - that is exactly the kind of situation that would exhibit power-of-two strided memory access that could lead to slow-downs.

like image 161
Mysticial Avatar answered Apr 24 '23 15:04

Mysticial


It's not exactly "faster", it rather utilises the available memory better since the hardware and the operating system manage memory in units having a size that is most likely a power of two. Allocating something that is less than a power of two will usually result in wasting memory because of alignment requirements.

If you dig deeper into allocators and OS memory managers, you will see that they manage everything in power-of-two sizes. An OS usually manages the memory of a process in terms of pages, and a page size is usually 4096 bytes nowadays. So if you want to allocate a piece that is 4000 bytes, the OS will still allocate 4096 bytes and the remaining 96 bytes will be wasted.

like image 20
Blagovest Buyukliev Avatar answered Apr 24 '23 15:04

Blagovest Buyukliev