I wonder if there is a light, straight forward way to have loops such as <code>for</code> and range based-<code>for</code> loops compute in parallel in C++. How would you implement such a thing? From Scala I know the <code>map</code>, <code>filter</code> and <code>foreach</code> functions and maybe it would also be possible to perform these in parallel? Is there an easy way to achieve this in C++? My primary platform is Linux, but it would be nice if it worked cross-platform.

With the parallel algorithms in C++17 we can now use: <pre class="prettyprint"><code>std::vector<std::string> foo; std::for_each( std::execution::par, foo.begin(), foo.end(), [](auto&& item) { //do stuff with item }); </code></pre> to compute loops in parallel. The first parameter specifies the execution policy

What is your platform? You can look at OpenMP, though it's not a part of C++. But it is widely supported by compilers. As for range-based for loops, see, e.g., Using OpenMP with C++11 range-based for loops?. I've also seen few documents at http://www.open-std.org that indicate some efforts to incorporate parallel constructs/algorithms into future C++, but don't know what's their current status. UPDATE Just adding some exemplary code: <pre class="prettyprint"><code>template <typename RAIter> void loop_in_parallel(RAIter first, RAIter last) { const size_t n = std::distance(first, last); #pragma omp parallel for for (size_t i = 0; i < n; i++) { auto& elem = *(first + i); // do whatever you want with elem } } </code></pre> The number of threads can be set at runtime via the <code>OMP_NUM_THREADS</code> environment variable.

Parallel Loops in C++

Tags:

c++

for-loop

c++11

concurrency

c++14

I wonder if there is a light, straight forward way to have loops such as for and range based-for loops compute in parallel in C++. How would you implement such a thing? From Scala I know the map, filter and foreach functions and maybe it would also be possible to perform these in parallel? Is there an easy way to achieve this in C++?

My primary platform is Linux, but it would be nice if it worked cross-platform.

586

asked Mar 27 '16 10:03

Exagon

3 Answers

With the parallel algorithms in C++17 we can now use:

std::vector<std::string> foo;
std::for_each(
    std::execution::par,
    foo.begin(),
    foo.end(),
    [](auto&& item)
    {
        //do stuff with item
    });

to compute loops in parallel. The first parameter specifies the execution policy

167

answered Oct 28 '22 00:10

Exagon

What is your platform? You can look at OpenMP, though it's not a part of C++. But it is widely supported by compilers.

As for range-based for loops, see, e.g., Using OpenMP with C++11 range-based for loops?.

I've also seen few documents at http://www.open-std.org that indicate some efforts to incorporate parallel constructs/algorithms into future C++, but don't know what's their current status.

UPDATE

Just adding some exemplary code:

template <typename RAIter>
void loop_in_parallel(RAIter first, RAIter last) {
   const size_t n = std::distance(first, last);

   #pragma omp parallel for
   for (size_t i = 0; i < n; i++) {
       auto& elem = *(first + i);
       // do whatever you want with elem
    }
}

The number of threads can be set at runtime via the OMP_NUM_THREADS environment variable.

answered Oct 27 '22 23:10

Daniel Langr

With C++11 you can parallelize a for loop with only a few lines of codes.

My function parallel_for() (define later in the post) splits a for loop into smaller chunks (sub loops), and each chunk assigned to a thread. Here is the usage:

/// Say you want to parallelize this:
for(int i = 0; i < nb_elements; ++i)
    computation(i);    

/// Then you would do:
parallel_for(nb_elements, [&](int start, int end){ 
    for(int i = start; i < end; ++i)
        computation(i); 
});

My parallel_for() also works within a class:

struct My_obj {

    /// Replacing:
    void sequential_for(){
        for(int i = 0; i < nb_elements; ++i)
            computation(i);
    }

    /// By:
    void process_chunk(int start, int end)
    {
        for(int i = start; i < end; ++i)
            computation(i);
    }

    void threaded_for(){
        parallel_for(nb_elements, [this](int s, int e){ 
            this->process_chunk(s, e); 
        } );
    }

    
};

Finally here is the implementation of parallel_for(), just paste in a header file and use it at will:

#include <algorithm>
#include <thread>
#include <functional>
#include <vector>

/// @param[in] nb_elements : size of your for loop
/// @param[in] functor(start, end) :
/// your function processing a sub chunk of the for loop.
/// "start" is the first index to process (included) until the index "end"
/// (excluded)
/// @code
///     for(int i = start; i < end; ++i)
///         computation(i);
/// @endcode
/// @param use_threads : enable / disable threads.
///
///
static
void parallel_for(unsigned nb_elements,
                  std::function<void (int start, int end)> functor,
                  bool use_threads = true)
{
    // -------
    unsigned nb_threads_hint = std::thread::hardware_concurrency();
    unsigned nb_threads = nb_threads_hint == 0 ? 8 : (nb_threads_hint);

    unsigned batch_size = nb_elements / nb_threads;
    unsigned batch_remainder = nb_elements % nb_threads;

    std::vector< std::thread > my_threads(nb_threads);

    if( use_threads )
    {
        // Multithread execution
        for(unsigned i = 0; i < nb_threads; ++i)
        {
            int start = i * batch_size;
            my_threads[i] = std::thread(functor, start, start+batch_size);
        }
    }
    else
    {
        // Single thread execution (for easy debugging)
        for(unsigned i = 0; i < nb_threads; ++i){
            int start = i * batch_size;
            functor( start, start+batch_size );
        }
    }

    // Deform the elements left
    int start = nb_threads * batch_size;
    functor( start, start+batch_remainder);

    // Wait for the other thread to finish their task
    if( use_threads )
        std::for_each(my_threads.begin(), my_threads.end(), std::mem_fn(&std::thread::join));
}

Lastly you can define macros to get even more compact expression:

#define PARALLEL_FOR_BEGIN(nb_elements) parallel_for(nb_elements, [&](int start, int end){ for(int i = start; i < end; ++i)
#define PARALLEL_FOR_END()})

Now converting a sequential for:

for(int i = 0; i < nb_elements; ++i)
    computation(i);

Is only a matter of doing:

PARALLEL_FOR_BEGIN(nb_edges)
{
    computation(i);
}PARALLEL_FOR_END();

answered Oct 27 '22 23:10

arkan

Related questions
                            
                                Why isn't C/C++'s "#pragma once" an ISO standard?
                            
                                How to print pthread_t
                            
                                Is memory allocation a system call?
                            
                                Last key in a std::map
                            
                                Unknown compiler version while compiling Boost with MSVC 14.0 (VS 2015)
                            
                                Should a buffer of bytes be signed or unsigned char buffer?
                            
                                Is std::vector memory freed upon a clear?
                            
                                std::unique_ptr for C functions that need free
                            
                                Fast way to generate pseudo-random bits with a given probability of 0 or 1 for each bit
                            
                                Is ++x %= 10 well-defined in C++?
                            
                                Convert CString to const char*
                            
                                What is the most efficient way to append one std::vector to the end of another?
                            
                                C++11: Replace all non-owning raw pointers with std::shared_ptr()?
                            
                                What is (x & 1) and (x >>= 1)?
                            
                                Can I set a default argument from a previous argument?
                            
                                STL vector: Moving all elements of a vector
                            
                                if (cin >> x) - Why can you use that condition?
                            
                                Difference between void main and int main in C/C++? [duplicate]
                            
                                How can I take a screenshot in a windows application?
                            
                                Is it possible to create a function dynamically, during runtime in C++?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parallel Loops in C++

Tags:

c++

for-loop

c++11

concurrency

c++14

Exagon

People also ask

3 Answers

Exagon

Daniel Langr

arkan

Recent Activity

Donate For Us