If I have a game which has a 3D world, and the world is quite big, so needs to be split into chunks, is there a major, if any, performance advantage of having 128 byte chunks over, say 150 byte chunks? Obviously, the objects in the chunks are still a whole number of bytes in size. i.e. Is <code>chunks[128][128][128]</code> faster than <code>chunks[150][150][150]</code> or <code>chunks[112][112][112]</code>? Are there any other side effects such as excessive RAM wastage afterwards? Are there any other factors that should be taken into consideration? I just see that it's a convention to store everything in variables and arrays of sizes that are powers of 2, but I'm not sure whether there's any merit to it, and if it could be better to use more human numbers like 100 or 150.

The other answers are indeed correct that power-of-two sized data will benefit from using shifts over multiplies. However, there is a dark side to power-of-two size data. And it can hit you when you least expect it. See these two question/answers: <ul> <li>Matrix multiplication: Small difference in matrix size, large difference in timings</li> <li>Why are elementwise additions much faster in separate loops than in a combined loop?</li> </ul> When your datasets are powers-of-two, they are more likely to be super-aligned in memory. (meaning their addresses will likely have the same modulo over a large power-of-two.) While this may seem desirable, they can lead to: <ul> <li>Conflict Cache Misses</li> <li>False Aliasing Stalls (mentioned in the second link above)</li> </ul> If you read the two questions linked to above, you can see that alignment can cause a slow-down of more than 3x - which will likely far out-weigh any benefit you get from using shifts as opposed to multiplies. <hr> So as with all performance questions, you need to measure, measure, measure... And be prepared to expect anything to happen. You mention that you are representing a 3D-space - that is exactly the kind of situation that would exhibit power-of-two strided memory access that could lead to slow-downs.

It's not exactly "faster", it rather utilises the available memory better since the hardware and the operating system manage memory in units having a size that is most likely a power of two. Allocating something that is less than a power of two will usually result in wasting memory because of alignment requirements. If you dig deeper into allocators and OS memory managers, you will see that they manage everything in power-of-two sizes. An OS usually manages the memory of a process in terms of pages, and a page size is usually 4096 bytes nowadays. So if you want to allocate a piece that is 4000 bytes, the OS will still allocate 4096 bytes and the remaining 96 bytes will be wasted.

Performance advantages of powers-of-2 sized data?

Tags:

If I have a game which has a 3D world, and the world is quite big, so needs to be split into chunks, is there a major, if any, performance advantage of having 128 byte chunks over, say 150 byte chunks? Obviously, the objects in the chunks are still a whole number of bytes in size.

i.e. Is chunks[128][128][128] faster than chunks[150][150][150] or chunks[112][112][112]? Are there any other side effects such as excessive RAM wastage afterwards? Are there any other factors that should be taken into consideration?

I just see that it's a convention to store everything in variables and arrays of sizes that are powers of 2, but I'm not sure whether there's any merit to it, and if it could be better to use more human numbers like 100 or 150.

267

asked Mar 01 '12 11:03

Greg

2 Answers

The other answers are indeed correct that power-of-two sized data will benefit from using shifts over multiplies.

However, there is a dark side to power-of-two size data. And it can hit you when you least expect it.

See these two question/answers:

Matrix multiplication: Small difference in matrix size, large difference in timings
Why are elementwise additions much faster in separate loops than in a combined loop?

When your datasets are powers-of-two, they are more likely to be super-aligned in memory. (meaning their addresses will likely have the same modulo over a large power-of-two.)

While this may seem desirable, they can lead to:

Conflict Cache Misses
False Aliasing Stalls (mentioned in the second link above)

If you read the two questions linked to above, you can see that alignment can cause a slow-down of more than 3x - which will likely far out-weigh any benefit you get from using shifts as opposed to multiplies.

So as with all performance questions, you need to measure, measure, measure... And be prepared to expect anything to happen.

You mention that you are representing a 3D-space - that is exactly the kind of situation that would exhibit power-of-two strided memory access that could lead to slow-downs.

161

answered Apr 24 '23 15:04

Mysticial

It's not exactly "faster", it rather utilises the available memory better since the hardware and the operating system manage memory in units having a size that is most likely a power of two. Allocating something that is less than a power of two will usually result in wasting memory because of alignment requirements.

If you dig deeper into allocators and OS memory managers, you will see that they manage everything in power-of-two sizes. An OS usually manages the memory of a process in terms of pages, and a page size is usually 4096 bytes nowadays. So if you want to allocate a piece that is 4000 bytes, the OS will still allocate 4096 bytes and the remaining 96 bytes will be wasted.

answered Apr 24 '23 15:04

Blagovest Buyukliev

Related questions
                            
                                Drools vs JBPM ? differences, pros and cons
                            
                                Javascript - Use closures sparingly?
                            
                                How to see 'git svn dcommit' changes before dcommitting? [duplicate]
                            
                                How to access environment variables inside .gdbinit and inside gdb itself?
                            
                                Database design for a recursive relationship
                            
                                Matlab: implementing what CTRL+C does, but in the code
                            
                                Change the binding of a Proc in Ruby
                            
                                Trello API get card comments
                            
                                Git reset and checkout by single command
                            
                                How to make one shot timer function in Delphi (like setTimeout in JavaScript)?
                            
                                Delphi Windows Service Design
                            
                                PostgreSQL query runs faster with index scan, but engine chooses hash join

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With