Different read and write count using cachegrind and callgrind

Tags:

I am doing some experiments with Cachegrind, Callgrind and Gem5. I noticed that a number of accesses were counted as read for cachegrind, as write for callgrind and for both read and write by gem5.

Let's take a very simple example:

int main() {
    int i, l;

    for (i = 0; i < 1000; i++) {
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        ... (100 times)
     }
 }

I compile with:

gcc ex.c --static -o ex

So basically, according to the asm file, addl $1, -8(%rbp) is executed 100,000 times. Since it's both a read and a write, I was expecting 100k read and 100k write. However, cachegrind only counts them as read and callgrind only as write.

 % valgrind --tool=cachegrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15356== Cachegrind, a cache and branch-prediction profiler
==15356== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al.
==15356== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15356== Command: ./ex
==15356== 
--15356-- warning: L3 cache found, using its data for the LL simulation.
==15356== 
==15356== I   refs:      111,535
==15356== I1  misses:        475
==15356== LLi misses:        280
==15356== I1  miss rate:    0.42%
==15356== LLi miss rate:    0.25%
==15356== 
==15356== D   refs:      104,894  (103,791 rd   + 1,103 wr)
==15356== D1  misses:        557  (    414 rd   +   143 wr)
==15356== LLd misses:        172  (     89 rd   +    83 wr)
==15356== D1  miss rate:     0.5% (    0.3%     +  12.9%  )
==15356== LLd miss rate:     0.1% (    0.0%     +   7.5%  )
==15356== 
==15356== LL refs:         1,032  (    889 rd   +   143 wr)
==15356== LL misses:         452  (    369 rd   +    83 wr)
==15356== LL miss rate:      0.2% (    0.1%     +   7.5%  )

 % valgrind --tool=callgrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15376== Callgrind, a call-graph generating cache profiler
==15376== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al.
==15376== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15376== Command: ./ex
==15376== 
--15376-- warning: L3 cache found, using its data for the LL simulation.
==15376== For interactive control, run 'callgrind_control -h'.
==15376== 
==15376== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==15376== Collected : 111532 2777 102117 474 406 151 279 87 85
==15376== 
==15376== I   refs:      111,532
==15376== I1  misses:        474
==15376== LLi misses:        279
==15376== I1  miss rate:    0.42%
==15376== LLi miss rate:    0.25%
==15376== 
==15376== D   refs:      104,894  (2,777 rd + 102,117 wr)
==15376== D1  misses:        557  (  406 rd +     151 wr)
==15376== LLd misses:        172  (   87 rd +      85 wr)
==15376== D1  miss rate:     0.5% ( 14.6%   +     0.1%  )
==15376== LLd miss rate:     0.1% (  3.1%   +     0.0%  )
==15376== 
==15376== LL refs:         1,031  (  880 rd +     151 wr)
==15376== LL misses:         451  (  366 rd +      85 wr)
==15376== LL miss rate:      0.2% (  0.3%   +     0.0%  )

Could someone give me a reasonable explanation? Would I be correct to consider there are in fact ~100k reads and ~100k writes (i.e. 2 cache accesses for an addl)?

666

asked Apr 03 '13 14:04

Maxime Chéramy

1 Answers

From cachegrind manual: 5.7.1. Cache Simulation Specifics

Instructions that modify a memory location (e.g. inc and dec) are counted as doing just a read, i.e. a single data reference. This may seem strange, but since the write can never cause a miss (the read guarantees the block is in the cache) it's not very interesting.

Thus it measures not the number of times the data cache is accessed, but the number of times a data cache miss could occur.

It would seem that callgrind's cache simulation logic is different from cachegrind. I would think that callgrind should produce the same results as cachegrind, so maybe this is a bug?

149

answered Sep 30 '22 17:09

Neopallium

Related questions
                            
                                Use "greater than or equals" or just "greater than"
                            
                                Is C# a superset of C?
                            
                                Why does an empty loop use so much processor time?
                            
                                How do you check if a pointer, in C, is of a certain type?
                            
                                Fastest way to flip the sign of a double / float in C
                            
                                read pixel value in bmp file [closed]
                            
                                Executing both 'If' as well as 'else' block [duplicate]
                            
                                Factorial in C without conditionals, loops and arithmetic operators
                            
                                Is there something that I can do in C but I can't do in C++?
                            
                                C/C++ CGI on Embedded device, POST, GET, LOGIN?
                            
                                How could this buffer be overrun?
                            
                                Why does OpenProcessToken fail with ERROR_ACCESS_DENIED
                            
                                Dynamic Programming - Minimum number of coins in C
                            
                                sctp_connectx() gives EINVAL on FreeBSD
                            
                                Is there an equivalent to the GNU linker "--just-symbols" option for non-GNU linkers?
                            
                                Reorder function in c file based on c header file
                            
                                How to use kgdb on ARM??

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Different read and write count using cachegrind and callgrind

Tags:

c

assembly

callgrind

cachegrind

gem5

Maxime Chéramy

People also ask

1 Answers

Neopallium

Recent Activity

Donate For Us