Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I obtain CPU cycle count in Win32?

In Win32, is there any way to get a unique cpu cycle count or something similar that would be uniform for multiple processes/languages/systems/etc.

I'm creating some log files, but have to produce multiple logfiles because we're hosting the .NET runtime, and I'd like to avoid calling from one to the other to log. As such, I was thinking I'd just produce two files, combine them, and then sort them, to get a coherent timeline involving cross-world calls.

However, GetTickCount does not increase for every call, so that's not reliable. Is there a better number, so that I get the calls in the right order when sorting?


Edit: Thanks to @Greg that put me on the track to QueryPerformanceCounter, which did the trick.

like image 289
Lasse V. Karlsen Avatar asked Sep 26 '08 11:09

Lasse V. Karlsen


2 Answers

Heres an interesting article! says not to use RDTSC, but to instead use QueryPerformanceCounter.

Conclusion:

Using regular old timeGetTime() to do timing is not reliable on many Windows-based operating systems because the granularity of the system timer can be as high as 10-15 milliseconds, meaning that timeGetTime() is only accurate to 10-15 milliseconds. [Note that the high granularities occur on NT-based operation systems like Windows NT, 2000, and XP. Windows 95 and 98 tend to have much better granularity, around 1-5 ms.]

However, if you call timeBeginPeriod(1) at the beginning of your program (and timeEndPeriod(1) at the end), timeGetTime() will usually become accurate to 1-2 milliseconds, and will provide you with extremely accurate timing information.

Sleep() behaves similarly; the length of time that Sleep() actually sleeps for goes hand-in-hand with the granularity of timeGetTime(), so after calling timeBeginPeriod(1) once, Sleep(1) will actually sleep for 1-2 milliseconds,Sleep(2) for 2-3, and so on (instead of sleeping in increments as high as 10-15 ms).

For higher precision timing (sub-millisecond accuracy), you'll probably want to avoid using the assembly mnemonic RDTSC because it is hard to calibrate; instead, use QueryPerformanceFrequency and QueryPerformanceCounter, which are accurate to less than 10 microseconds (0.00001 seconds).

For simple timing, both timeGetTime and QueryPerformanceCounter work well, and QueryPerformanceCounter is obviously more accurate. However, if you need to do any kind of "timed pauses" (such as those necessary for framerate limiting), you need to be careful of sitting in a loop calling QueryPerformanceCounter, waiting for it to reach a certain value; this will eat up 100% of your processor. Instead, consider a hybrid scheme, where you call Sleep(1) (don't forget timeBeginPeriod(1) first!) whenever you need to pass more than 1 ms of time, and then only enter the QueryPerformanceCounter 100%-busy loop to finish off the last < 1/1000th of a second of the delay you need. This will give you ultra-accurate delays (accurate to 10 microseconds), with very minimal CPU usage. See the code above.

like image 110
prakash Avatar answered Sep 19 '22 13:09

prakash


You can use the RDTSC CPU instruction (assuming x86). This instruction gives the CPU cycle counter, but be aware that it will increase very quickly to its maximum value, and then reset to 0. As the Wikipedia article mentions, you might be better off using the QueryPerformanceCounter function.

like image 28
Greg Hewgill Avatar answered Sep 21 '22 13:09

Greg Hewgill