Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of std::mutex and QMutex in MinGW 64 (posix thread version)

I have tried to replace QMutex in my application (monte carlo simulation) by std::mutex, and surprisingly, the computation speed was divided by 3. The mutex locking/unlocking performance cost rised from negligible to about 66% of the threads time.

I digged into implementations sources. I initially thought that both were wrappers to Win32 threads (with an extra layer of pthread for std::thread), but in fact Qt is not using any kernel functions for the mutexes, and has its own internal implementation based on atomic variables. This one seems much faster.

  • Is it really possible to fully implement a mutex with only atomic variables? Does it have limitations?
  • Why is this not used in the STL?
  • What would be the guidelines for fully avoiding mutexes in my case? I am accumulating (additive operations) values over floating point buffers, shared between multiple threads

Thanks

like image 664
galinette Avatar asked Mar 20 '15 15:03

galinette


2 Answers

Qt is not using any kernel functions for the mutexes, and has its own internal implementation based on atomic variables

Of course Qt calls the OS to make the thread wait if the mutex is already locked.

The idea here is that in good multithreaded code the chance of waiting on a already locked mutex are extremely low; the common case to optimize is the uncontended mutex, which must be kept as fast as possible.

Hence, QMutex is designed around the Linux futex system call, and to use atomics and lockfree programming. In the uncontended case, no system calls are necessary, only some clever atomics programming; in the contended case, system calls are necessary (to wait/wake threads), and indeed that's what Qt uses:

  • pthread_mutex_* / pthread_cond_* on a generic Unix
  • futex on Linux
  • WaitForSingleObjectEx on Windows
  • semaphore_* on Mac

For more info behind QMutex design, see here.

Why does the STL not use a similar approach? I have no idea. Possibly because there's native_handle function which should return "something" in all cases, even when the mutex is unlocked or locked but uncontended, so it always uses the system calls.

(Anyhow, I'm not surprised that Qt outperforms std::mutex. If it didn't, QMutex would be a wrapper around std::mutex.)

like image 177
peppe Avatar answered Nov 13 '22 06:11

peppe


It sounds like QMutex is implemented with a spinlock. To answer your questions:

  • Yes, it's possible. It has limitations, see this answer for instance.
  • With just atomics you can write a spinlock; you can't write a real mutex from anything else in the standard library.
  • Without knowing what your algorithm does it's hard to say. But it looks like you could benefit from reading this book: Is Parallel Programming Hard, And, If So, What Can You Do About It?. The Counting chapter in particular, seems to roughly correspond to what you described. TL;DR you will have to try out the different algorithms in that chapter to see which one is better for your particular case.
like image 24
DanielKO Avatar answered Nov 13 '22 07:11

DanielKO