At the Jacksonville meeting the proposal P0024r2 effectively adopting the specifications from the Parallelism TS was accepted into the C++17 (draft). This proposal adds overloads for many of the algorithms taking an execution policy argument to indicate what kind of parallelism should be considered. There are three execution policies already defined in <code><execution></code> (20.19.2 [execution]): <ul> <li> <code>std::execution::sequenced_policy</code> (20.19.4 [execpol.seq]) with a <code>constexpr</code> object <code>std::execution::seq</code> (20.19.7 [parallel.execpol.objects]) to indicate sequential execution similar to calling the algorithms without an execution policy.</li> <li> <code>std::execution::parallel_policy</code> (20.19.5 [execpol.par]) with a <code>constexpr</code> object <code>std::execution::par</code> (20.19.7 [parallel.execpol.objects]) to indicate execution of algorithms potentially using multiple threads.</li> <li> <code>std::execution::parallel_unsequenced_policy</code> (20.19.6 [execpol.vec]) with a <code>constexpr</code> object <code>std::execution::par_unseq</code> (20.19.7 [parallel.execpol.objects])to indicate execution of algorithms potentially using vector execution and/or multiple threads.</li> </ul> The STL algorithms generally take user-defined objects (iterators, function objects) as arguments. What are the constraints on user-defined objects to make them usable with the parallel algorithms using the standard execution policies? For example, when using an algorithm like in the example below, what are the implications for <code>FwdIt</code> and <code>Predicate</code>? <pre class="prettyprint"><code>template <typename FwdIt, typename Predicate> FwdIt call_remove_if(FwdIt begin, FwdIt end, Predicate predicate) { return std::remove_if(std::execution::par, begin, end, predicate); } </code></pre>

The short answer is that the element access functions (essentially the operations required by the algorithms on the various arguments; see below for details) used with algorithms using the execution policy <code>std::execution::parallel</code> are not allowed to cause data races or dead-locks. The element access functions used with algorithms using the execution policy <code>std::execution::parallel_unsequenced_policy</code> additionally can't use any blocking synchronisation. <h3>The Details</h3> The description is based on the ballot document N4604. I haven't verified if some of the clauses were modified in response to national body comments (a cursory check seems to imply that there were no edits so far). Section 25.2 [algorithms.parallel] specifies the semantics of the parallel algorithms. There are multiple constraints which do not apply to the algorithms not taking an execution policy, broken down in multiple sections: <ol> <li> In 25.2.2 [algorithms.parallel.user] constrains what predicate functions can do to their arguments: <blockquote> Function objects passed into parallel algorithms as objects of type <code>Predicate</code>, <code>BinaryPredicate</code>, <code>Compare</code>, and <code>BinaryOperation</code> shall not directly or indirectly modify objects via their arguments. </blockquote> The way the clause is written it seems that the objects themselves can be modified as long as the other constraints (see below) are obeyed. Note that this constraint is independent of the execution policy and, thus, applies even when using <code>std::execution::sequenced_policy</code>. The full answer is more complicated than that and it seems the specification is currently unintentionally over constrained (see the last paragraph below). </li> <li> In 25.2.3 [algorithms.parallel.exec] adds constraints on element access functions (see below) which are specific to the different execution policies: <ul> <li>When using <code>std::execution::sequenced_policy</code> the element access functions are all invoked from the same thread, i.e., the execution is not interleaved in any form.</li> <li>When using <code>std::execution::parallel_policy</code> different threads may invoke the element access functions concurrently from different threads. Invoking element access functions from different threads is not allowed to cause data races or to cause dead-locks. However, invocations of element access from the same thread are [indeterminately] sequence, i.e., there are no interleaved invocations of element access function from the same thread. For example, if a <code>Predicate</code> used with <code>std::execution::par</code> counts how often it is called, the corresponding count will need to be appropriately synchronized.</li> <li> When using <code>std::execution::parallel_unsequenced_policy</code> the invocation of element access functions can be interleaved both between different thread as well as within one thread of execution. That is, use of a blocking synchronization primitive (like a <code>std::mutex</code>) may cause dead-lock as the same thread may try to synchronize multiple times (and, e.g., try to lock the same mutex multiple times). When using standard library functions for element access functions the constraint in the standard is (25.2.3 [algorithms.parallel.exec] paragraph 4): <blockquote> A standard library function is vectorization-unsafe if it is specified to synchronize with another function invocation, or another function invocation is specified to synchronize with it, and if it is not a memory allocation or deallocation function. Vectorization-unsafe standard library functions may not be invoked by user code called from <code>execution::parallel_unsequenced_policy</code> algorithms. </blockquote> </li> <li>What happens when using implementation defined execution policies is, unsurprisingly, implementation defined.</li> </ul> </li> <li>In 25.2.4 [algorithm.parallel.exception] the use of exceptions thrown from element access functions is sort of constrained: when an element access function throws an exception, <code>std::terminate()</code> is called. That is, it is legal to throw an exception but it is unlikely that the outcome is desirable. Note that <code>std::terminate()</code> will be called even when using <code>std::execution::sequenced_policy</code>.</li> </ol> <h3>Element Access Functions</h3> The constraints above use the term element access function. This term is defined in 25.2.1 [algorithm.parallel.defns] paragraph 2. There are four groups of functions classified as element access functions: <blockquote> <ul> <li>All operations of the categories of the iterators that the algorithm is instantiated with.</li> <li>Operations on those sequence elements that are required by its specification.</li> <li>User-provided function objects to be applied during the execution of the algorithm, if required by the specification.</li> <li>Operations on those function objects required by the specification.</li> </ul> </blockquote> Essentially, element access functions are all the operations which the standard explicitly refers to in the specification of the algorithms or the concepts used with these algorithms. Functions not mentioned and, e.g., detected to be present (e.g., using SFINAE) are not constrained and, effectively, can't be called from the parallel algorithms imposing synchronisation constraints on their use. <h3>The Problem</h3> It is slightly concerning that there seems to be no guarantee that the objects the [mutating] element access functions are applied to are different between different threads. In particular, I can't see any guarantee that the iterator operations applied to an iterator object can not be applied to the same iterator object from two different threads! The implication is that, e.g., <code>operator++()</code> on an iterator object would need to somehow synchronise its state. I can't see how, e.g., <code>operator==()</code> could do something useful if the object is modified in a different thread. It seems unintentional that operations on the same object need to be synchronised as it doesn't make any sense to apply [mutating] element access functions concurrently to an object. However, I can't see any text stating that different objects are used (I guess, I need to raise a defect for this).

What are the constraints on the user using STL's parallel algorithms?

At the Jacksonville meeting the proposal P0024r2 effectively adopting the specifications from the Parallelism TS was accepted into the C++17 (draft). This proposal adds overloads for many of the algorithms taking an execution policy argument to indicate what kind of parallelism should be considered. There are three execution policies already defined in <execution> (20.19.2 [execution]):

std::execution::sequenced_policy (20.19.4 [execpol.seq]) with a constexpr object std::execution::seq (20.19.7 [parallel.execpol.objects]) to indicate sequential execution similar to calling the algorithms without an execution policy.
std::execution::parallel_policy (20.19.5 [execpol.par]) with a constexpr object std::execution::par (20.19.7 [parallel.execpol.objects]) to indicate execution of algorithms potentially using multiple threads.
std::execution::parallel_unsequenced_policy (20.19.6 [execpol.vec]) with a constexpr object std::execution::par_unseq (20.19.7 [parallel.execpol.objects])to indicate execution of algorithms potentially using vector execution and/or multiple threads.

The STL algorithms generally take user-defined objects (iterators, function objects) as arguments. What are the constraints on user-defined objects to make them usable with the parallel algorithms using the standard execution policies?

For example, when using an algorithm like in the example below, what are the implications for FwdIt and Predicate?

template <typename FwdIt, typename Predicate> FwdIt call_remove_if(FwdIt begin, FwdIt end, Predicate predicate) {     return std::remove_if(std::execution::par, begin, end, predicate); }

Does C support parallel execution?

C++17 added support for parallel algorithms to the standard library, to help programs take advantage of parallel execution for improved performance.

What is parallel STL?

Parallel STL is an implementation of the C++ standard library algorithms with support for execution policies, as specified in ISO/IEC 14882:2017 standard, commonly called C++17.

How do you achieve parallelism in C++?

Parallel execution: By adding std::execution::par the algorithm is executed in parallel using all available threads. Vectorized parallel execution By adding std::execution::par_unseq the algorithm is executed in parallel but in addition vectorization is used.

The short answer is that the element access functions (essentially the operations required by the algorithms on the various arguments; see below for details) used with algorithms using the execution policy std::execution::parallel are not allowed to cause data races or dead-locks. The element access functions used with algorithms using the execution policy std::execution::parallel_unsequenced_policy additionally can't use any blocking synchronisation.

The Details

The description is based on the ballot document N4604. I haven't verified if some of the clauses were modified in response to national body comments (a cursory check seems to imply that there were no edits so far).

Section 25.2 [algorithms.parallel] specifies the semantics of the parallel algorithms. There are multiple constraints which do not apply to the algorithms not taking an execution policy, broken down in multiple sections:

In 25.2.2 [algorithms.parallel.user] constrains what predicate functions can do to their arguments:

Function objects passed into parallel algorithms as objects of type Predicate, BinaryPredicate, Compare, and BinaryOperation shall not directly or indirectly modify objects via their arguments.

The way the clause is written it seems that the objects themselves can be modified as long as the other constraints (see below) are obeyed. Note that this constraint is independent of the execution policy and, thus, applies even when using std::execution::sequenced_policy. The full answer is more complicated than that and it seems the specification is currently unintentionally over constrained (see the last paragraph below).
In 25.2.3 [algorithms.parallel.exec] adds constraints on element access functions (see below) which are specific to the different execution policies:
- When using std::execution::sequenced_policy the element access functions are all invoked from the same thread, i.e., the execution is not interleaved in any form.
- When using std::execution::parallel_policy different threads may invoke the element access functions concurrently from different threads. Invoking element access functions from different threads is not allowed to cause data races or to cause dead-locks. However, invocations of element access from the same thread are [indeterminately] sequence, i.e., there are no interleaved invocations of element access function from the same thread. For example, if a Predicate used with std::execution::par counts how often it is called, the corresponding count will need to be appropriately synchronized.
- When using std::execution::parallel_unsequenced_policy the invocation of element access functions can be interleaved both between different thread as well as within one thread of execution. That is, use of a blocking synchronization primitive (like a std::mutex) may cause dead-lock as the same thread may try to synchronize multiple times (and, e.g., try to lock the same mutex multiple times). When using standard library functions for element access functions the constraint in the standard is (25.2.3 [algorithms.parallel.exec] paragraph 4):
  
  A standard library function is vectorization-unsafe if it is specified to synchronize with another function invocation, or another function invocation is specified to synchronize with it, and if it is not a memory allocation or deallocation function. Vectorization-unsafe standard library functions may not be invoked by user code called from execution::parallel_unsequenced_policy algorithms.
- What happens when using implementation defined execution policies is, unsurprisingly, implementation defined.
In 25.2.4 [algorithm.parallel.exception] the use of exceptions thrown from element access functions is sort of constrained: when an element access function throws an exception, std::terminate() is called. That is, it is legal to throw an exception but it is unlikely that the outcome is desirable. Note that std::terminate() will be called even when using std::execution::sequenced_policy.

Element Access Functions

The constraints above use the term element access function. This term is defined in 25.2.1 [algorithm.parallel.defns] paragraph 2. There are four groups of functions classified as element access functions:

All operations of the categories of the iterators that the algorithm is instantiated with.

Operations on those sequence elements that are required by its specification.

User-provided function objects to be applied during the execution of the algorithm, if required by the specification.

Operations on those function objects required by the specification.

Essentially, element access functions are all the operations which the standard explicitly refers to in the specification of the algorithms or the concepts used with these algorithms. Functions not mentioned and, e.g., detected to be present (e.g., using SFINAE) are not constrained and, effectively, can't be called from the parallel algorithms imposing synchronisation constraints on their use.

The Problem

It is slightly concerning that there seems to be no guarantee that the objects the [mutating] element access functions are applied to are different between different threads. In particular, I can't see any guarantee that the iterator operations applied to an iterator object can not be applied to the same iterator object from two different threads! The implication is that, e.g., operator++() on an iterator object would need to somehow synchronise its state. I can't see how, e.g., operator==() could do something useful if the object is modified in a different thread. It seems unintentional that operations on the same object need to be synchronised as it doesn't make any sense to apply [mutating] element access functions concurrently to an object. However, I can't see any text stating that different objects are used (I guess, I need to raise a defect for this).

What are the constraints on the user using STL's parallel algorithms?

Tags:

c++

c++17

stl

Dietmar Kühl

People also ask

1 Answers

The Details

Element Access Functions

The Problem

Dietmar Kühl

Recent Activity

Donate For Us

What are the constraints on the user using STL's parallel algorithms?

Tags:

c++

c++17

stl

Dietmar Kühl

People also ask

1 Answers

The Details

Element Access Functions

The Problem

Dietmar Kühl

Related questions

Recent Activity

Donate For Us