I was arguing with a colleague, about lock_guard, and he proposed that lock_guard is problably slower than mutex::lock() / mutex::unlock() due to the cost of instantiate and unistantiate the class lock_guard.
Then I created this simple test and, surprisely, the version with lock_guard is almost two times faster than the version with mutex::lock() / mutex::unlock()
#include <iostream>
#include <mutex>
#include <chrono>
std::mutex m;
int g = 0;
void func1()
{
m.lock();
g++;
m.unlock();
}
void func2()
{
std::lock_guard<std::mutex> lock(m);
g++;
}
int main()
{
auto t = std::chrono::system_clock::now();
for (int i = 0; i < 1000000; i++)
{
func1();
}
std::cout << "Take: " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - t).count() << " ms" << std::endl;
t = std::chrono::system_clock::now();
for (int i = 0; i < 1000000; i++)
{
func2();
}
std::cout << "Take: " << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - t).count() << " ms" << std::endl;
return 0;
}
The results on my machine:
Take: 41 ms
Take: 22 ms
Can someone clarify why and how this can be?
The release build produces the same result for both versions.
The DEBUG
build shows ~33% longer time for func2
; the difference I see in the disassembly that func2
uses __security_cookie
and invokes @_RTC_CheckStackVars@8
.
Are you timing DEBUG?
EDIT:
Additionally, while looking at RELEASE
disassembly, I noticed that mutex
methods were saved in two registries:
010F104E mov edi,dword ptr [__imp___Mtx_lock (010F3060h)]
010F1054 xor esi,esi
010F1056 mov ebx,dword ptr [__imp___Mtx_unlock (010F3054h)]
and called the same way from both func1
and func2
:
010F1067 call edi
....
010F107F call ebx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With