Today I wrote some code to test the performance of mutex.
This is the boost(1.54) version, compiled on vs2010 with O2 optimization:
boost::mutex m;
auto start = boost::chrono::system_clock::now();
for (size_t i = 0; i < 50000000; ++i) {
boost::lock_guard<boost::mutex> lock(m);
}
auto end = boost::chrono::system_clock::now();
boost::chrono::duration<double> elapsed_seconds = end - start;
std::cout << elapsed_seconds.count() << std::endl;
And this is the std version, compiled on VS2013, with O2 optimization too:
std::mutex m;
auto start = std::chrono::system_clock::now();
for (size_t i = 0; i < 50000000; ++i) {
std::lock_guard<std::mutex> lock(m);
}
auto end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end - start;
std::cout << elapsed_seconds.count() << std::endl;
A bit different but doing just the same thing. My CPU is Intel Core i7-2600K, my OS is Windows 7 64bit, and the result is: 0.7020s vs 2.1684s, 3.08 times.
boost::mutex will try _interlockedbittestandset first, and if it failed, the big cheese WaitForSingleObject will come second, it's simple to understand.
It seems that std::mutex of VS2013 is much more complex, I have already tried to understand it but I could not get the point, why it's so complex ? is there a faster way ?
Taking a std::unique_lock of a std::mutex is significantly slower than taking a std::unique_lock of a std::shared_mutex. This is despite the fact that they both offer the exact same lock constraints, and underneath the hood both of them are just calling RtlAcquireSRWLockExclusive().
A fast mutex is fast because the acquisition and release steps are optimized for the usual case when there's no contention for the mutex. The critical step in acquiring the mutex is to atomically decrement and test an integer counter that indicates how many threads either own or are waiting for the mutex.
std::mutex The mutex class is a synchronization primitive that can be used to protect shared data from being simultaneously accessed by multiple threads.
It seems that stl::mutex
might only use system calls, which take a LOT of overhead; but boost::mutex
implements at least some of its functionality programmatically -- i.e. it tries to avoid system calls whenever possible, which would be the reason for the try _interlockedbittestandset
check before WaitForSingleObject
.
I don't know the actual internals of MS's stl, but I've seen performance differences like this from examples in an operating systems class.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With