I need to improve the throughput of the system.
The usual cycle of optimization has been done and we have already achieved 1.5X better throughput.
I am now beginning to wonder if I can utilize the cachegrind output to improve the system's throughput.
Can somebody point me to how to begin on this?
What I understand is we need to ensure most frequently used data should be kept small enough so that it remains in L1 cache and the next set of data should fit in the L2.
Is this the right direction I am taking?
4.1 Cache profiling Cachegrind is a tool for doing cache simulations and annotating your source line-by-line with the number of cache misses.
It`s true that cachegrind output in itself does not give too much information how to go about optimizing code. One needs to know how to interpret it and what you are saying about data fitting into L1 and L2 is indeed the right direction.
To fully understand how memory access patterns influence performance, I recommend reading an excellent paper "What Every Programmer Should Know About Memory" by Ulrich Drepper, the GNU libc maintainer.
If you're having trouble parsing the cachegrind output, look into KCacheGrind (it should be available in your distro of choice). I use it and find it quite helpful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With