How would you generically detect cache line associativity from user mode code?

Tags:

I'm putting together a small patch for the cachegrind/callgrind tool in valgrind which will auto-detect, using completely generic code, CPU instruction and cache configuration (right now only x86/x64 auto-configures, and other architectures don't provide CPUID type configuration to non-privileged code). This code will need to execute entirely in a non-privileged context i.e. pure user mode code. It also needs to be portable across very different POSIX implementations, so grokking /proc/cpuinfo won't do as one of our destination systems doesn't have such a thing.

Detecting the frequency of the CPU, the number of caches, their sizes, and even cache line size can all be done using 100% generic POSIX code which has no CPU-specific opcodes whatsoever (just a lot of reasonable assumptions, such as that adding two numbers together, if without memory or register dependency stalls, probably will be executed in a single cycle). This part is fairly straightforward.

What isn't so straightforward, and why I ask StackOverflow, is how to detect cache line associativity for a given cache? Associativity is how many places in a cache can contain a given cache line from main memory. I can see that L1 cache associativity could be detected, but L2 cache? Surely the L1 associativity gets in the way?

I appreciate this is probably a problem which cannot be solved. But I throw it onto StackOverflow and hope someone knows something I don't. Note that if we fail here, I'll simply hard code in an associativity default of four way, assuming it wouldn't make a huge difference to results.

Thanks,
Niall

886

asked Mar 25 '13 14:03

Niall Douglas

1 Answers

Here's a scheme:

Have a memory access pattern with a stride S , and number of unique elements accessed = N. The test first touches each unique element, and then measures the average time to access each element, by accessing the same pattern a very large number of times.

Example: for S = 2 and N = 4 the address pattern would be 0,2,4,6,0,2,4,6,0,2,4,6,...

Consider a multi-level cache hierarchy. You can make the following reasonable assumptions:

Size of n+1 th level-cache is a power of two times the size of the nth cache
The associativity of n+1 th cache is also a power of two times the associativity of the nth cache.

These 2 assumptions allow us to say that if two addresses map to the same set in n+1 th cache(say L2), then they must map to the same set in nth cache(say L1).

Say you know the sizes of L1, L2 caches. You need to find the associativity of L2 cache.

set stride S = size of L2 cache (so that every access maps to the same set in L2, and in L1 too)
vary N (by powers of 2)

You get the following regimes:

Regime 1: N <= associativity of L1. (All accesses HIT in L1)
Regime 2: associativity of L1 < N <= associativity of L2 (All accesses miss in L1, but HIT in L2)
Regime 3: N > associativity of L2 ( All accesses miss in L2)

So, if you plot average access time against N (when S = size of L2), you will see a step-like plot. The end of the lowest step gives you the associativity of L1. The next step gives you the associativity of L2.

You can repeat the same procedure between L2-L3 and so-on. Please let me know if that helps. The method of obtaining cache parameters by varying the stride of a memory access pattern is similar to that used by the LMBENCH benchmark. I don't know if lmbench infers associativity too.

138

answered Oct 24 '22 15:10

Neha Karanjkar

Related questions
                            
                                Why is there no ^^ operator in C/C++?
                            
                                Can a C compiler rearrange stack variables?
                            
                                Should we still be optimizing "in the small"?
                            
                                Post increment operator behavior [duplicate]
                            
                                Is there a more efficient way to get the length of a 32bit integer in bytes?
                            
                                How to check if value has even parity of bits or odd?
                            
                                Is if(TRUE) a good idea in C?
                            
                                What is a good desktop programming language to learn for a web developer? [closed]
                            
                                What is a neat way of breaking out of many for loops at once?
                            
                                Decode video in Raspberry Pi without using OpenMAX?
                            
                                Compound literal and designated initializer warning from GCC but not Clang
                            
                                Tool for tracing C preprocessor execution during macro expansion?
                            
                                How to get `gcc` to generate `bts` instruction for x86-64 from standard C?
                            
                                Bit-fields and sequence points
                            
                                Is there memory protection on GPUs
                            
                                What kind of optimizations are included in -funsafe-math-optimizations?
                            
                                How to determine ARMv8 features at runtime on iOS?
                            
                                How do I prevent a quoted include from searching the directory of the current source file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How would you generically detect cache line associativity from user mode code?

Tags:

cpu-architecture

c

cpu-cache

valgrind

Niall Douglas

People also ask

1 Answers

Neha Karanjkar

Recent Activity

Donate For Us