Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do the C++ STL (ExecutionPolicy) algorithms determine how many parallel threads to use?

C++17 upgraded 69 STL algorithms to support parallelism, by the use of an optional ExecutionPolicy parameter (as the 1st argument). eg.

std::sort(std::execution::par, begin(v), end(v)); 

I suspect the C++17 standard deliberately says nothing about how to implement the multi-threaded algorithms, leaving it up to the library writers to decide what is best (and allowing them to change their minds, later). Still, I'm keen to understand at a high level what issues are being considered in the implementation of the parallel STL algorithms.

Some questions on my mind include (but are not limited to!):

  • How is the maximum number of threads used (by the C++ application) related to the number of CPU &/or GPU cores on the machine?
  • What differences are there in the number of threads each algorithm uses? (Will each algorithm always use the same number of threads in every circumstance?)
  • Is there any consideration given to other parallel STL calls on other threads (within the same app)? (eg. if a thread invokes std::for_each(par,...), will it use more/less/same threads depending on if a std::sort(par, ...) is already running on some other thread(s)? Is there a thread pool perhaps?)
  • Is any consideration given to how busy the cores are due to external factors? (eg. if 1 core is very busy, say analysing SETI signals, will the C++ application reduce the number of threads it uses?)
  • Do some algorithms only use CPU cores? or only GPU cores?
  • I suspect implementations will vary from library to library (compiler to compiler?), even details about this would be interesting.

I realise the point of these parallel algorithms is to shield the Programmer from having to worry about these details. However, any info that gives me a high-level mental picture of what's going on inside the library calls would be appreciated.

like image 262
Scott Smedley Avatar asked Oct 31 '17 05:10

Scott Smedley


People also ask

What is parallel STL?

Parallel STL is an implementation of the C++ standard library algorithms with support for execution policies, as specified in ISO/IEC 14882:2017 standard, commonly called C++17.

What do you mean by a parallel algorithm explain?

In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is an algorithm which can do multiple operations in a given time. It has been a tradition of computer science to describe serial algorithms in abstract machine models, often the one known as random-access machine.

What is par in C++?

par means "execute in parallel", which permits the implementation to execute on multiple threads in parallel.


2 Answers

Most of these questions can not be answered by the standard as of today. However, your question, as I understand it, mixes two concepts:

C1. Constraints on parallel algorithms

C2. Execution of algorithms

All the C++17 parallel STL thing is about C1: it sets constraints on how instructions and/or threads could be interleaved/transformed in a parallel computation. On the other hand, C2 is about being standardized, the keyword is executor (more on this later).

For C1, there are 3 standard policies (in std::execution::seq, par and par_unseq) that correspond to every combination of task and instruction parallelism. For example, when performing an integer accumulation, par_unseq could be used, since the order is not important. However, for float point arithmetic, where addition is not associative, a better fit would be seq to, at least, get a deterministic result. In short: policies set constraints on parallel computation and these constraints could be potentially exploited by a smart compiler.

On the other hand, once you have a parallel algorithm and its constraints (and possibly after some optimization/transformation), the executor will find a way to execute it. There are default executors (for CPU for example) or you can create your own, then, all that configuration regarding number of threads, workload, processing unit, etc... can be set.

As of today, C1 is in the standard, but not C2, so if you use C1 with a compliant compiler, you will not be able to specify which execution profile you want and the library implementation will decide for you (maybe through extensions).

So, to address your questions:

(Regarding your first 5 questions) By definition, C++17 parallel STL library does not define any computation, just data dependency, in order to allow for possible data flow transformations. All these questions will be answered (hopefully) by executor, you can see the current proposal here. It will look something like:

executor = get_executor(); sort( std::execution::par.on(executor), vec.begin(), vec.end()); 

Some of your questions are already defined in that proposal.

(For the 6th) There are a number of libraries out there that already implement similar concepts (C++ executor was inspired by some of them indeed), AFAIK: hpx, Thrust or Boost.Compute. I do not know how the last two are actually implemented, but for hpx they use lightweight threads and you can configure execution profile. Also, the expected (not yet standardized) syntax of the code above for C++17 is essentially the same as in (was heavily inspired by) hpx.

References:

  1. C++17 Parallel Algorithms and Beyond by Bryce Adelstein lelbach
  2. The future of ISO C++ Heterogeneous Computing by Michael Wong
  3. Keynote C++ executors to enable heterogeneous computing in tomorrow's C++ today by Michael Wong
  4. Executors for C++ - A Long Story by Detlef Vollmann
like image 52
Fusho Avatar answered Sep 20 '22 21:09

Fusho


Pre-final C++17 draft tells nothing about "how to implement the multi-threaded algorithms", that's true. Implementation owners decide on their own how to do that. E.g. Parallel STL uses TBB as a threading back-end and OpenMP as a vectorization back-end. I guess that to find out how does this implementation matches your machine - you need to read the implementation-specific documentation

like image 21
Rostislav Povelikin Avatar answered Sep 19 '22 21:09

Rostislav Povelikin