Write a program and try to compare(measure, if you can) the time of accessing data from main memory and cache.
If you can do that, then how to measure the speed of each level of cache?
The performance of the cache memory is measured in terms of a quantity called Hit Ratio. When the CPU refers to the memory and reveals the word in the cache, it's far stated that a hit has successfully occurred.
Cache memory is an extremely fast memory type that acts as a buffer between RAM and the CPU. It holds frequently requested data and instructions so that they are immediately available to the CPU when needed. Cache memory is used to reduce the average time to access data from the Main memory.
You can also calculate a miss ratio by dividing the number of misses with the total number of content requests. For example, if you look over a period of time and find that the misses your cache experienced was11, and the total number of content requests was 48, you would divide 11 by 48 to get a miss ratio of 0.229.
Cache memory, which also is a type of random access memory, does not need to be refreshed. It is built directly into the CPU to give the processor the fastest possible access to memory locations and provides nanosecond speed access time to frequently referenced instructions and data.
You need to come up with a heuristic that forces a 100% (or very close) cache miss (hopefully you have a cache invalidation op code?) and 100% cache hit. Hooray, that works for 1 level of cache. Now, how to do the same for level 2 and 3?
In all seriousness, there probably isn't a way to do this 100% reliably without special hardware and traces connected to the CPU and memory, but here's what I would do:
Write a "bunch" of stuff to 1 location in memory - enough that you can be sure that it is hitting the L1 cache consistantly and record the time (which affects your cache so beware). You should do this set of writes without branches to try and get rid of branch prediction inconsistancies. That is best time. Now, every so often, write a cache-line's worth of data to a random far away location in RAM at the end of your known location right and record the new time. Hopefully, this takes longer. Keep doing this recording the various times and hopefully you will see a couple of timings that tend to group up. Each of these groups "could" show timings for L2, L3, and memory access timings. The problem is there is so much other stuff getting in the way. The OS could context switch you and screw up your cache. An interrupt could come along and through your timing off. There will be a lot of stuff that could throw the values off. But, hopefully, you get enough signal in your data to see if it works.
This would probably be easier to do on a simpler, embedded type system where the OS (if any) won't get in your way.
This generally requires some knowledge of the “geometry” of cache and other aspects of it. It is also helpful to have some control of the system beyond simple user access to it and implementation-dependent things such as finer timing than might be supplied through the standard C clock
mechanism.
Here is an initial approach:
volatile
to prevent the compiler from optimizing away accesses that otherwise have no effect.When you do this, you will typically see fast speeds (number of bytes read/written per second) for small lengths and slower speeds for longer lengths. The speed decreases will occur where the sizes of the different levels of cache are exceeded. So you are quite likely to see the sizes of L1 and L2 cache reflected in data collected using the above technique.
Here are some reasons that approach is inadequate:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With