Using valgrind to measure cache misses [closed]

Tags:

I have a critical path which executes in one thread, pinned to a single core.

I am interested in identifying where cache misses are occurring. After looking around it seems valgrind's cachegrind tool would help me. However I have some questions regarding the tool's capabilities in this scenario:

How specific are the locations of cache misses provided? Does it output the variable name?
Can I profile just one thread?
Is it possible to profile specific parts of the code?
All the capabilities for measuring cache misses, do they equally-apply to TLB misses?
Can I use cachegrind with my release/optimised code?
I understand valgrind uses a virtual machine to sample. How accurate is this approach?

Question 1 is the most important.

Any help with command line arguments is most-appreciated.

647

asked Sep 12 '15 18:09

user997112

1 Answers

cachegrind can output both global and local informations concerning cache misses, and annotate at the line level (if the original program was compiled with debug information). For instance, the following code:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

int main(int argc, char**argv) {
  size_t n = (argc == 2 ) ? atoi(argv[1]) : 100;
  double* v = malloc(sizeof(double) * n);
  for(size_t i = 0; i < n ; i++)
    v[i] = i;

  double s = 0;
  for(size_t i = 0; i < n ; ++i)
    s += v[i] * v[n - 1 - i];
  printf("%ld\n", s);
  free(v);
  return 0;
}

compiled with gcc a.c -O2 -g -o a and run with valgrind --tool=cachegrind ./a 10000000 outputs:

==11551== Cachegrind, a cache and branch-prediction profiler
==11551== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al.
==11551== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==11551== Command: ./a 10000000
==11551== 
--11551-- warning: L3 cache found, using its data for the LL simulation.
80003072
==11551== 
==11551== I   refs:      150,166,282
==11551== I1  misses:            876
==11551== LLi misses:            870
==11551== I1  miss rate:        0.00%
==11551== LLi miss rate:        0.00%
==11551== 
==11551== D   refs:       30,055,919  (20,041,763 rd   + 10,014,156 wr)
==11551== D1  misses:      3,752,224  ( 2,501,671 rd   +  1,250,553 wr)
==11551== LLd misses:      3,654,291  ( 2,403,770 rd   +  1,250,521 wr)
==11551== D1  miss rate:        12.4% (      12.4%     +       12.4%  )
==11551== LLd miss rate:        12.1% (      11.9%     +       12.4%  )
==11551== 
==11551== LL refs:         3,753,100  ( 2,502,547 rd   +  1,250,553 wr)
==11551== LL misses:       3,655,161  ( 2,404,640 rd   +  1,250,521 wr)
==11551== LL miss rate:          2.0% (       1.4%     +       12.4%  )

The I1 miss rates tells us there was no instruction cache miss.

The D1 miss rates tells us there was a lot of cache L1 misses

The LL miss rates tells us there was some Last Level cache misses.

To get a more accurate view of the miss location, we can run kcachegrind cachegrind.out.11549, select the L1 Data Read miss and navigate in the application code, as shown by this screenshot

This should answer 1). I think the answer is no to 2) 3) and 4). It's yes for 5) if you compiled with debug info (without them, you'll get the global info, but not the per line info). As of 6) I'd say valgrind usually provides a very decent first approximation. Goig to perf is obviously more accurate !

answered Sep 21 '22 18:09

serge-sans-paille

Related questions
                            
                                Ensure abstract bass class is a shared_ptr
                            
                                error: use of undeclared identifier 'ctime_s'
                            
                                Passing constant references of primitive types as function arguments
                            
                                The template disambiguator for dependent names
                            
                                Why would I want to use a smart pointer in this situation?
                            
                                Visual Studio 2013 does not delete the copy constructor when a user-defined move constructor is provided
                            
                                gcc doesn't accept pack expansion in default template argument
                            
                                Image (color?) segmentation with opencv C++
                            
                                How do I prevent implicit template instantiations for a specific template?
                            
                                Is it always safe to negate a floating point number
                            
                                Minimum difference between sum of two numbers in an array
                            
                                Confusion regarding types, overflows and UB in pointer-integral addition
                            
                                Empty struct or anonymous struct as tag
                            
                                Why is C/C++ preprocessor adding a space here?
                            
                                Converting a lambda expression with variable capture to a function pointer [duplicate]
                            
                                What's the proper way of using header-only library?
                            
                                c++ using declaration and function overload
                            
                                C++ std::unique_ptr stored inside std::map use of deleted function ill formed
                            
                                How to take advantage of multi-cpu in c++?
                            
                                What can you do with templates with zero template parameters?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using valgrind to measure cache misses [closed]

Tags:

c++

performance

optimization

cpu

valgrind

user997112

People also ask

1 Answers

serge-sans-paille

Recent Activity

Donate For Us