Suppose in speed-critical code we have a pair of arrays that are frequently used together, where the exact size doesn't matter, it just needs to be set to something reasonable, e.g. <pre class="prettyprint"><code>int a[256], b[256]; </code></pre> Is this potentially a pessimization because the low address bits being the same can make it harder for the cache to handle both arrays simultaneously? Would it be better to specify e.g. 300 instead of 256?

Moving my comment to an answer: You are correct to suspect that powers-of-two could be problematic. But it usually only applies when you have more than 2 strides. It doesn't get really bad until you exceed your L1 cache associativity. But even before that you might run into false aliasing issues. Here are two examples where powers-of-two actually become problematic: <ul> <li>Why are elementwise additions much faster in separate loops than in a combined loop?</li> <li>Matrix multiplication: Small difference in matrix size, large difference in timings</li> </ul> In the first example, there are 4 arrays - all of which are aligned to the same offset from the start of a 4k page. In the second example, the column-wise hopping of a matrix completely destroys performance when the size is a power-of-two. <hr> In any case, note that the key concept is actually the alignment of the arrays, not the size of them. If you find that you are running into slow-downs, just add some padding between your arrays to break the alignment.

Avoiding powers of 2 for cache friendliness

Tags:

performance

c

optimization

memory

caching

Suppose in speed-critical code we have a pair of arrays that are frequently used together, where the exact size doesn't matter, it just needs to be set to something reasonable, e.g.

int a[256], b[256];

Is this potentially a pessimization because the low address bits being the same can make it harder for the cache to handle both arrays simultaneously? Would it be better to specify e.g. 300 instead of 256?

366

asked Aug 08 '12 15:08

rwallace

1 Answers

Moving my comment to an answer:

You are correct to suspect that powers-of-two could be problematic. But it usually only applies when you have more than 2 strides. It doesn't get really bad until you exceed your L1 cache associativity. But even before that you might run into false aliasing issues.

Here are two examples where powers-of-two actually become problematic:

Why are elementwise additions much faster in separate loops than in a combined loop?
Matrix multiplication: Small difference in matrix size, large difference in timings

In the first example, there are 4 arrays - all of which are aligned to the same offset from the start of a 4k page.

In the second example, the column-wise hopping of a matrix completely destroys performance when the size is a power-of-two.

In any case, note that the key concept is actually the alignment of the arrays, not the size of them. If you find that you are running into slow-downs, just add some padding between your arrays to break the alignment.

answered Oct 29 '22 08:10

Mysticial

Related questions
                            
                                What can glStencil do?
                            
                                Combine directory and file path - C
                            
                                Time each CPU core spends in C0 power state
                            
                                Creating a C++ DLL loadable by Lua
                            
                                How to calculate (x,y) for a fixed arc length away from a point on a circumference
                            
                                Dynamic FFI in Go
                            
                                strncmp proper usage
                            
                                C: Lock-Free Memory Allocation Library
                            
                                How do I load and execute an ELF binary executable manually?
                            
                                What is the recommended way to implement text scrolling in ncurses?
                            
                                "Art of Exploitation" disassembly example isn't the same (C code) [duplicate]
                            
                                List of valid arguments for the --host parameter on configure files
                            
                                How to resize regions allocated by VirtualAlloc?
                            
                                Floating point anomaly when an unused statement is not commented out?
                            
                                How to check the type of the variable in C at runtime?
                            
                                How to properly check strptime for valid dates in C
                            
                                C/C++ convert int to short and inline asm (ARM specific)
                            
                                Best practice: Where to resample PCM and which tool?
                            
                                How do I prevent an optimizing compiler from disrupting interrupt free critical sections?
                            
                                Linux server socket - Bad file descriptor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With