I'm looking into the exact implications of using QueryPerformanceCounter in our system and am trying to understand it's impact on the application. I can see from running it on my 4-core single cpu machine that it takes around 230ns to run. When I run it on a 24-core 4 cpu xeon it takes around 1.4ms to run. More interestingly on my machine when running it in multiple threads they don't impact each other. But on the multi-cpu machine the threads cause some sort of interaction that causes them to block each other. I'm wondering if there is some shared resource on the bus that they all query? What exactly happens when I call QueryPerformanceCounter and what does it really measure?
Windows QueryPerformanceCounter() has logic to determine the number of processors and invoke syncronization logic if necessary. It attempts to use the TSC register but for multiprocessor systems this register is not guaranteed to be syncronized between processors (and more importantly can vary greatly due to intelligent downclocking and sleep states).
MSDN says that it doesn't matter which processor this is called on so you may be seeing extra syncronization code for such a situation cause overhead. Also remember that it can invoke a bus transfer so you may be seeing bus contention delays.
Try using SetThreadAffinityMask() if possible to bind it to a specific processor. Otherwise you might just have to live with the delay or you could try a different timer (for example take a look at http://en.wikipedia.org/wiki/High_Precision_Event_Timer).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With