Code 1 shows the parallelization of 'for' loop using openmp. I would like to achieve similar parallelization after unrolling the 'for' loops using template metaprogramming (refer Code 2). Could you please help?
Code 1: Outer for loop run in parallel with four threads
void some_algorithm()
{
// code
}
int main()
{
#pragma omp parallel for
for (int i=0; i<4; i++)
{
//some code
for (int j=0;j<10;j++)
{
some_algorithm()
}
}
}
Code 2: Same as Code 1, I want to run outer for loop in parallel using openmp. How to do that?1
template <int I, int ...N>
struct Looper{
template <typename F, typename ...X>
constexpr void operator()(F& f, X... x) {
for (int i = 0; i < I; ++i) {
Looper<N...>()(f, x..., i);
}
}
};
template <int I>
struct Looper<I>{
template <typename F, typename ...X>
constexpr void operator()(F& f, X... x) {
for (int i = 0; i < I; ++i) {
f(x..., i);
}
}
};
int main()
{
Looper<4, 10>()(some_algorithm);
}
1Thanks to Nim for code 2 How to generate nested loops at compile time?
If you remove the constexpr
declarations, then you can use _Pragma("omp parallel for")
, something like this
#include <omp.h>
template <int I, int ...N>
struct Looper{
template <typename F, typename ...X>
void operator()(F& f, X... x) {
_Pragma("omp parallel for if (!omp_in_parallel())")
for (int i = 0; i < I; ++i) {
Looper<N...>()(f, x..., i);
}
}
};
template <int I>
struct Looper<I>{
template <typename F, typename ...X>
void operator()(F& f, X... x) {
for (int i = 0; i < I; ++i) {
f(x..., i);
}
}
};
void some_algorithm(...) {
}
int main()
{
Looper<4, 10>()(some_algorithm);
}
Which you can see being compiled to use OpenMP at https://godbolt.org/z/nPrcWP (observe the call to GOMP_parallel
...). The code also compiles with LLVM (switch the compiler to see :-)).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With