Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling TLB Misses

I want to see which pages are being accessed by my program. Now one way is to use mprotect with SIGSEGV handler to note down pages which are being accessed. However, this involves the overhead of setting protection bits for all the memory pages I'm interested in.

The second way that comes in mind is to invalidate the Translation Lookaside Buffer (TLB) in the beginning and then note down the misses. At each miss I will note down the addressed memory page and therefore note it down. Now the question is how to handle TLB misses in user space for a linux program.

And if you know even a faster method than either TLB misses or mprotect to note down dirtied memory pages, kindly let me know. Also, I want a solution for x86 only.

like image 219
MetallicPriest Avatar asked Dec 16 '22 09:12

MetallicPriest


2 Answers

I want to see which pages are being accessed by my program.

You can simulate a CPU and get this data. Variants:

  • 1) valgrind - dynamic translator of user-space binaries with good support of instrumentation. Try cachegrind tool - it will emulate even L1/L2 caches; also you can try to build new tool to log all memory accesses (e.g. with page granularity)
  • 2) qemu - dynamic translator, both system-wide and process-wide modes. No instrumentation in the original qemu as I know
  • 3) bochs - system-wide CPU emulator (very slow). You can easily hack "memory access" code to get memory log.
  • 4) PTLsim - www.ptlsim.org/papers/PTLsim-ISPASS-2007.pdf

However, this involves the overhead of setting protection bits for all the memory pages

Is this overhead too big?

Now the question is how to handle TLB misses in user space for a linux program.

You cant handle a miss nor in user-space neither in kernel-space (on x86 and many other popular platforms). This is because most platforms manages TLB misses in hardware:. MMU (part of CPU/chipset) will do a walk on page tables and will get physical address transparently. Only if some bits are set or when the address region is not mapped, page fault interrupt is generated and delivered to kernel.

Also, seems there is no way to dump TLB in modern CPUs (but 386DX was able to to this)

You can try to detect TLB miss by the delay introduced. But this delay can be hided by Out-of-order start of TLB lookup.

Also, most hardware events (memory access, tlb access, tlb hits, tlb misses) are counted by hardware performance monitoring (this part of CPU is used by Vtune, CodeAnalyst and oprofile). Unfortunately, this is only a global counters for events and you can't activate more than 2-4 events at same time. The good news is that you can set the perfmon counter to interrupt when some count is reached. Then you will get (via interrupt) address of instruction ($eip), where the count was reached. So, you can find TLB-miss-heavy hot-spot with this hardware (it is in every modern x86 cpu; both intel and amd).

like image 175
osgx Avatar answered Dec 18 '22 22:12

osgx


TLB is transparent to userspace program, at most you can count TLB misses by some performance counter (without addresses).

like image 45
adobriyan Avatar answered Dec 18 '22 23:12

adobriyan