I recently did some digging into memory and how to use it properly. Of course, I also stumbled upon prefetching and how I can make life easier for the CPU.
I ran some benchmarks to see the actual benefits of proper storage/access of data and instructions. These benchmarks showed not only the expected benefits of helping your CPU prefetch, it also showed that prefetching also speeds up the process during runtime. After about 100 program cycles, the CPU seems to have figured it out and has optimized the cache accordingly. This saves me up to 200.000 ticks per cycle, the number drops from around 750.000 to 550.000. I got these Numbers using the qTestLib.
Now to the Question: Is there a safe way to use this runtime-speedup, letting it warm up, so to speak? Or should one not calculate this in at all and just build faster code from the start up?
A warmup gradually revs up your cardiovascular system by raising your body temperature and increasing blood flow to your muscles. Warming up may also help reduce muscle soreness and lessen your risk of injury. Cooling down after your workout allows for a gradual recovery of preexercise heart rate and blood pressure.
Warm up for 5 to 10 minutes. The more intense the activity, the longer the warm-up. Do whatever activity you plan on doing (running, walking, cycling, etc.) at a slower pace (jog, walk slowly).
The purpose of warming up before physical activity is to prepare mentally and physically for your chosen activity. Warming up increases your heart rate and therefore your blood flow which enables more oxygen to reach your muscles.
First of all, there is generally no gain in trying to warm up a process prior to normal execution: That would only speed up the first 100 program cycles in your case, gaining a total of less than 20000 ticks. That's much less than the 75000 ticks you would have to invest in the warming up.
Second, all these gains from warming up a process/cache/whatever are rather brittle. There is a number of events that destroy the warming effect that you generally do not control. Mostly these come from your process not being alone on the system. A process switch can behave pretty much like an asynchronous cache flush, and whenever the kernel needs a page of memory, it may drop a page from the disk cache.
Since the factors make computing time pretty unpredictable, they need to be controlled when running benchmarks that are supposed to produce results of any reliability. Apart from that, these effects are mostly ignored.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With