At the Jacksonville meeting the proposal P0024r2 effectively adopting the specifications from the Parallelism TS was accepted into the C++17 (draft). This proposal adds overloads for many of the algorithms taking an execution policy argument to indicate what kind of parallelism should be considered. There are three execution policies already defined in <execution>
(20.19.2 [execution]):
std::execution::sequenced_policy
(20.19.4 [execpol.seq]) with a constexpr
object std::execution::seq
(20.19.7 [parallel.execpol.objects]) to indicate sequential execution similar to calling the algorithms without an execution policy.std::execution::parallel_policy
(20.19.5 [execpol.par]) with a constexpr
object std::execution::par
(20.19.7 [parallel.execpol.objects]) to indicate execution of algorithms potentially using multiple threads.std::execution::parallel_unsequenced_policy
(20.19.6 [execpol.vec]) with a constexpr
object std::execution::par_unseq
(20.19.7 [parallel.execpol.objects])to indicate execution of algorithms potentially using vector execution and/or multiple threads.The STL algorithms generally take user-defined objects (iterators, function objects) as arguments. What are the constraints on user-defined objects to make them usable with the parallel algorithms using the standard execution policies?
For example, when using an algorithm like in the example below, what are the implications for FwdIt
and Predicate
?
template <typename FwdIt, typename Predicate> FwdIt call_remove_if(FwdIt begin, FwdIt end, Predicate predicate) { return std::remove_if(std::execution::par, begin, end, predicate); }
C++17 added support for parallel algorithms to the standard library, to help programs take advantage of parallel execution for improved performance.
Parallel STL is an implementation of the C++ standard library algorithms with support for execution policies, as specified in ISO/IEC 14882:2017 standard, commonly called C++17.
Parallel execution: By adding std::execution::par the algorithm is executed in parallel using all available threads. Vectorized parallel execution By adding std::execution::par_unseq the algorithm is executed in parallel but in addition vectorization is used.
The short answer is that the element access functions (essentially the operations required by the algorithms on the various arguments; see below for details) used with algorithms using the execution policy std::execution::parallel
are not allowed to cause data races or dead-locks. The element access functions used with algorithms using the execution policy std::execution::parallel_unsequenced_policy
additionally can't use any blocking synchronisation.
The description is based on the ballot document N4604. I haven't verified if some of the clauses were modified in response to national body comments (a cursory check seems to imply that there were no edits so far).
Section 25.2 [algorithms.parallel] specifies the semantics of the parallel algorithms. There are multiple constraints which do not apply to the algorithms not taking an execution policy, broken down in multiple sections:
In 25.2.2 [algorithms.parallel.user] constrains what predicate functions can do to their arguments:
Function objects passed into parallel algorithms as objects of type
Predicate
,BinaryPredicate
,Compare
, andBinaryOperation
shall not directly or indirectly modify objects via their arguments.
The way the clause is written it seems that the objects themselves can be modified as long as the other constraints (see below) are obeyed. Note that this constraint is independent of the execution policy and, thus, applies even when using std::execution::sequenced_policy
. The full answer is more complicated than that and it seems the specification is currently unintentionally over constrained (see the last paragraph below).
In 25.2.3 [algorithms.parallel.exec] adds constraints on element access functions (see below) which are specific to the different execution policies:
std::execution::sequenced_policy
the element access functions are all invoked from the same thread, i.e., the execution is not interleaved in any form.std::execution::parallel_policy
different threads may invoke the element access functions concurrently from different threads. Invoking element access functions from different threads is not allowed to cause data races or to cause dead-locks. However, invocations of element access from the same thread are [indeterminately] sequence, i.e., there are no interleaved invocations of element access function from the same thread. For example, if a Predicate
used with std::execution::par
counts how often it is called, the corresponding count will need to be appropriately synchronized.When using std::execution::parallel_unsequenced_policy
the invocation of element access functions can be interleaved both between different thread as well as within one thread of execution. That is, use of a blocking synchronization primitive (like a std::mutex
) may cause dead-lock as the same thread may try to synchronize multiple times (and, e.g., try to lock the same mutex multiple times). When using standard library functions for element access functions the constraint in the standard is (25.2.3 [algorithms.parallel.exec] paragraph 4):
A standard library function is vectorization-unsafe if it is specified to synchronize with another function invocation, or another function invocation is specified to synchronize with it, and if it is not a memory allocation or deallocation function. Vectorization-unsafe standard library functions may not be invoked by user code called from
execution::parallel_unsequenced_policy
algorithms.
What happens when using implementation defined execution policies is, unsurprisingly, implementation defined.
In 25.2.4 [algorithm.parallel.exception] the use of exceptions thrown from element access functions is sort of constrained: when an element access function throws an exception, std::terminate()
is called. That is, it is legal to throw an exception but it is unlikely that the outcome is desirable. Note that std::terminate()
will be called even when using std::execution::sequenced_policy
.
The constraints above use the term element access function. This term is defined in 25.2.1 [algorithm.parallel.defns] paragraph 2. There are four groups of functions classified as element access functions:
- All operations of the categories of the iterators that the algorithm is instantiated with.
- Operations on those sequence elements that are required by its specification.
- User-provided function objects to be applied during the execution of the algorithm, if required by the specification.
- Operations on those function objects required by the specification.
Essentially, element access functions are all the operations which the standard explicitly refers to in the specification of the algorithms or the concepts used with these algorithms. Functions not mentioned and, e.g., detected to be present (e.g., using SFINAE) are not constrained and, effectively, can't be called from the parallel algorithms imposing synchronisation constraints on their use.
It is slightly concerning that there seems to be no guarantee that the objects the [mutating] element access functions are applied to are different between different threads. In particular, I can't see any guarantee that the iterator operations applied to an iterator object can not be applied to the same iterator object from two different threads! The implication is that, e.g., operator++()
on an iterator object would need to somehow synchronise its state. I can't see how, e.g., operator==()
could do something useful if the object is modified in a different thread. It seems unintentional that operations on the same object need to be synchronised as it doesn't make any sense to apply [mutating] element access functions concurrently to an object. However, I can't see any text stating that different objects are used (I guess, I need to raise a defect for this).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With