Let's consider some code to safely increment a variable in a for loop with multiple threads.
To achieve this you have to use some kind of lock mechanism when incrementing the variable. When I was searching for a solution I came up with the following to solutions.
My questions are:
mutex
instead of #pragma omp critical
?#include <iostream>
#include <mutex>
int main(int argc, char** argv)
{
int someVar = 0;
std::mutex someVar_mutex;
#pragma omp parallel for
for (int i = 0; i < 1000; i++)
{
std::lock_guard<std::mutex> lock(someVar_mutex);
++someVar;
}
std::cout << someVar << std::endl;
return 0;
}
#include <iostream>
int main(int argc, char** argv)
{
int someVar = 0;
#pragma omp parallel for
for (int i = 0; i < 1000; i++)
{
#pragma omp critical
++someVar;
}
std::cout << someVar << std::endl;
return 0;
}
The difference is that you can lock and unlock a std::unique_lock . std::lock_guard will be locked only once on construction and unlocked on destruction. So for use case B you definitely need a std::unique_lock for the condition variable.
There exist valid use cases where it is desirable for scoped_lock to accept variadic template parameter packs which may be empty. And the empty case should not lock anything. And that's why lock_guard isn't deprecated.
std::lock_guard The class lock_guard is a mutex wrapper that provides a convenient RAII-style mechanism for owning a mutex for the duration of a scoped block. When a lock_guard object is created, it attempts to take ownership of the mutex it is given.
There are two primary benefits to using std::unique_lock<> over std::lock_guard<> : you can transfer ownership of the lock between instances, and. the std::unique_lock<> object does not have to own the lock on the mutex it is associated with.
The critical section serves the same purpose as acquiring a lock (and will probably use a lock internally).
std::mutex
is standard C++ feature whereas #pragma omp critical
is an OpenMP extension and not defined by the standard.
The critical section names are global to the entire program (regardless of module boundaries). So if you have a critical section by the same name in multiple modules, not two of them can be executed at the same time. If the name is omitted, a default name is assumed. (docs).
Would prefer standard C++, unless there is a good reason to use the other (after measuring both).
Not direct targeting the question, but there is also another problem with this loop: the lock is executed on each loop iteration. This degrades performance significantly (look also at this answer).
From cppreference.com
about lock_guard one can read
The class lock_guard is a mutex wrapper that provides a convenient RAII-style mechanism for owning a mutex for the duration of a scoped block.
and from the OpenMP
standard about the critical one can read:
The critical construct restricts execution of the associated structured block to a single thread at a time.
So, both mechanism provide means to deal with the same problem i.e., ensure the mutual exclusion of a block of code.
Are they equally good or does one of them has some fallbacks?
Both are coarser grain locking-mechanisms, however, by default, the OpenMP critical
is even more coarser grain since:
All critical constructs without a name are considered to have the same unspecified name.
Therefore, if a name is not specified all critical regions use the same global lock, which would be semantically the same as using lock_guard
with the same mutex
. Nonetheless, one can along with the critical
pragma specify a name:
An optional name may be used to identify the critical construct.
#pragma omp critical(name)
Specifying the name
on a critical
is semantically similar to passing the lock to std::lock_guard<std::mutex> lock(name);
.
Worth nothing that OpenMP also offers explicitly locking mechanism such as omp_lock_t (some details in this SO Thread).
Notwithstanding, whenever possible you should aim for finer grain synchronization mechanism than a critical region, namely reduction, atomics or even using data redundancy. For instance, in your code snippet, the most performance approach would have been to use the reduction
clause, like so:
#pragma omp parallel for(+:someVar)
for (int i = 0; i < 1000; i++)
{
++someVar;
}
- When to use a mutex instead of #pragma omp critical?
IMO this should never be a consideration, first because as pointed out by none other then Michael Klemm:
One that thing that should be noted: "#pragma omp critical" can only interact with other "critical" constructs. You cannot mix C++ locks and OpenMP locks (lock API or "critical" constructs) with C++ locks like std::mutex. So, you there's code that is protected using std::mutex (or std::lock_guard on top), then other OpenMP code that should be mutual exclusively needs to also use std::mutex (and vice versa).
and furthermore as Gilles pointed out (which I also shared the same opinion):
As a matter of principle, mixing two different parallelism models is a bad idea. So if you use OpenMP parallelism, avoid using the C++ one as interactions between the two might be unexpected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With