I adopted online to measure SSE performance.
#ifndef __TIMER_H__
#define __TIMER_H__
#pragma warning (push)
#pragma warning (disable : 4035) // disable no return value warning
__forceinline unsigned int GetPentiumTimer()
{
__asm
{
xor eax,eax // VC won't realize that eax is modified w/out this
// instruction to modify the val.
// Problem shows up in release mode builds
_emit 0x0F // Pentium high-freq counter to edx;eax
_emit 0x31 // only care about low 32 bits in eax
xor edx,edx // so VC gets that edx is modified
}
}
#pragma warning (pop)
#endif
I did the measurement on my Pentium D E2200 CPU, and it works fine (it shows aligned SSE instructions are faster). But on my i3 CPU I get unaligned instructions faster 70% of the tests.
Do you guys think this clock tick measurement is not suitable for i3 CPU?
QueryPerformanceCounter (on Windows at least) is definitely much better than inline assembly. I can't see any reason to use inline assembly (which will give you problems compiling to x64 on Visual Studio which doesn't support inline assembly) over that function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With