There is a loop. <pre class="prettyprint"><code>for (int i = 0; i < n; ++i) { //... v[i] = o.f(i); //... } </code></pre> Each <code>v[i] = o.f(i)</code> is independent of all the other <code>v[i] = o.f(i)</code>. <code>n</code> can be any value and it may not be a multiple of the number of cores. What is the simplest way to use all the cores to do this?

The <code>ExecutionPolicy</code> overloads of the algorithms in <code><algorithm></code> exist for this purpose. <code>std::transform</code> applies a function to each element of a source range to assign to a destination range. <code>v.begin()</code> is an acceptable destination, so long as <code>v</code> is of appropriate size. Your snippet assumes this when it uses <code>v[i]</code>, so I will too. We then need an iterator that gives the values <code>[0, n)</code> as our source, so <code>boost::counting_iterator<int></code>. Finally we need a <code>Callable</code> that will apply <code>o.f</code> to our values, so lets capture <code>o</code> in a lambda. <pre class="prettyprint"><code>#include <algorithm> #include <execution> #include <boost/iterator/counting_iterator.hpp> // assert(v.size() >= n) std::transform(std::execution::par, boost::counting_iterator<int>(0), boost::counting_iterator<int>(n), v.begin(), [&o](int i){ return o.f(i); }); </code></pre> If <code>o.f</code> does not perform any "vectorization-unsafe operations", you are able to use <code>std::execution::par_unseq</code>, which may interleave calls on the same thread (i.e. unroll the loop and use SIMD instructions)

How can I use all the cores in the loop?

Tags:

c++

multithreading

c++11

c++17

c++14

There is a loop.

for (int i = 0; i < n; ++i) {
    //...
    v[i] = o.f(i);
    //...
}

Each v[i] = o.f(i) is independent of all the other v[i] = o.f(i).
n can be any value and it may not be a multiple of the number of cores. What is the simplest way to use all the cores to do this?

333

asked Mar 14 '18 15:03

Ufx

2 Answers

The ExecutionPolicy overloads of the algorithms in <algorithm> exist for this purpose. std::transform applies a function to each element of a source range to assign to a destination range.

v.begin() is an acceptable destination, so long as v is of appropriate size. Your snippet assumes this when it uses v[i], so I will too.

We then need an iterator that gives the values [0, n) as our source, so boost::counting_iterator<int>.

Finally we need a Callable that will apply o.f to our values, so lets capture o in a lambda.

#include <algorithm>
#include <execution>
#include <boost/iterator/counting_iterator.hpp>

// assert(v.size() >= n)
std::transform(std::execution::par, boost::counting_iterator<int>(0), boost::counting_iterator<int>(n), v.begin(), [&o](int i){ return o.f(i); });

If o.f does not perform any "vectorization-unsafe operations", you are able to use std::execution::par_unseq, which may interleave calls on the same thread (i.e. unroll the loop and use SIMD instructions)

112

answered Nov 13 '22 14:11

Caleth

In the land of existing compilers, and remembering that M/S can't even get this stuff right for C++11, never mind about C++17/20, the C++11 answer goes something like:

typedef v.value_type R;
std::vector< std::future<R> > fut(n);
for (int i=0; i<n; i++)
    fut[i] = std::async(std::launch::async, O::f, o, i);
for (auto& f : fut)
    v.push_back(f.get());

@arne suggests we can do better by throttling the number of tasks by considering the number of processors (P), which is true, though the above code will give you a clear indication on whether you will really benefit from multi-threading the method f. Given we only want to launch X jobs simultaneously, where X is > P, < 3*P depending on the variation in job complexity (note I am relying on a signed index):

typedef v.value_type R;
std::vector< std::future<R> > fut(n);
for (ssize_t i=0, j=-X; j<n; i++,j++)
{
    if (i<n)    fut[i] = std::async(std::launch::async, O::f, o, i);
    if (j>=0)   v.push_back(fut[j].get());
}

I'm not claiming the above code is "great", but if the jobs are complex enough for us to need multithreading, the cost of looping a few extra times isn't gointg to be noticed. You will notice that if X > n the loop will spin a few times in the middle, but will produce the correct result :-)

answered Nov 13 '22 14:11

Gem Taylor

Related questions
                            
                                Is a using-directive in a detail namespace problematic?
                            
                                Why aren't binaries placed in CMAKE_CURRENT_BINARY_DIR?
                            
                                QTableWidget, centering cellWidgets
                            
                                Doesn't compiler give an ambiguous error for "const" and "not-const" functions
                            
                                How to put system icons in menus?
                            
                                Is there a way to make this C++14 recursive template shorter in C++17?
                            
                                understanding algorithmic complexity
                            
                                How to adjust QTextEdit to fit it's contents
                            
                                Does being trivially copyable imply being nothrow copyable?
                            
                                How does implicit conversion work in C++
                            
                                Redeclare variable inside enum
                            
                                Does assigning make_unique require std::move() to an empty unique_ptr?
                            
                                Using a temporary to initialize multiple members
                            
                                Use cases of std::byte [duplicate]
                            
                                Pass function with optional template type param to a class constructor and assign it to a method
                            
                                Modify value in unordered_set
                            
                                Can C++ 17 handle nested variadic templates? [duplicate]
                            
                                Preventing casting ints to enums in C++
                            
                                for_each not returning (boolean) value
                            
                                C++ binary predicate implementation requirements for std::search

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With