I'm trying to come up with an example program which would have a high cache-miss rate. I thought I could try accessing a matrix column by column like so:
#include <stdlib.h>
int main(void)
{
int i, j, k;
int w = 1000;
int h = 1000;
int **block = malloc(w * sizeof(int*));
for (i = 0; i < w; i++) {
block[i] = malloc(h * sizeof(int));
}
for (k = 0; k < 10; k++) {
for (i = 0; i < w; i++) {
for (j = 0; j < h; j++) {
block[j][i] = 0;
}
}
}
return 0;
}
when I compile this with -O0
flag and run using perf stat -r 5 -B -e cache-references,cache-misses ./a.out
it gives me:
Performance counter stats for './a.out' (5 runs):
715,463 cache-references ( +- 0.42% )
527,634 cache-misses # 73.747 % of all cache refs ( +- 2.53% )
0.112001160 seconds time elapsed ( +- 1.58% )
which is good enough for my purposes. However if I go ahead and change the matrix size to 2000x2000
it gives:
Performance counter stats for './a.out' (5 runs):
6,364,995 cache-references ( +- 2.32% )
2,534,989 cache-misses # 39.827 % of all cache refs ( +- 0.02% )
0.461104903 seconds time elapsed ( +- 0.92% )
and if I increase it even further to 3000x3000
I get:
Performance counter stats for './a.out' (5 runs):
59,204,028 cache-references ( +- 1.36% )
5,662,629 cache-misses # 9.565 % of all cache refs ( +- 0.11% )
1.116573625 seconds time elapsed ( +- 0.32% )
which is strange because I would expect to get more cache miss rate as the size increases. I need something that will be as platform independent as possible. computer architecture class was long ago so any insight would be welcomed..
Notes
I said I need something relatively platform independent but still these are my specs:
For small caches, such as the 4 KiB cache, increasing the block size beyond 64 bytes increases the miss rate because of conflicts. For larger caches, increasing the block size beyond 64 bytes does not change the miss rate.
A cache hit ratio of 90% and higher means that most of the requests are satisfied by the cache. A value below 80% on static files indicates inefficient caching due to poor configuration.
The more cache levels a system needs to check, the more time it takes to complete a request. This results in an increased cache miss rate, especially if the system needs to look into the main database to fetch the requested data.
Calculate the cache hit ratio by dividing the number of cache hits by the combined numbers of hits and misses, then multiplying it by 100. For example, if a website has 107 hits and 16 misses, the site owner will divide 107 by 123, resulting in 0.87.
Beware of automatic prefetch in modern CPUs - it can often detect strided accesses. Perhaps try a random access pattern, e.g.:
int main(void)
{
int i;
int n = 1000 * 1000;
int *block = malloc(n * sizeof(int));
for (i = 0; i < n / 10; i++) {
int ri = rand() % n;
block[ri] = 0;
}
return 0;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With