I want to make this code parallel:
std::vector<float> res(n,0);
std::vector<float> vals(m);
std::vector<float> indexes(m);
// fill indexes with values in range [0,n)
// fill vals and indexes
for(size_t i=0; i<m; i++){
res[indexes[i]] += //something using vas[i];
}
In this article it's suggested to use:
#pragma omp parallel for reduction(+:myArray[:6])
In this question the same approach is proposed in the comments section.
I have two questions:
m
at compile time, and from these two examples it seems that's required. Is it so? Or if I can use it for this case, what do I have to replace ?
with in the following command #pragma omp parallel for reduction(+:res[:?])
? m
or n
?for
are relative to indexes
and vals
and not to res
, especially considering that reduction
is done on the latter one?However, If so, how can I solve this problem?
The OpenMP reduction clause lets you specify one or more thread-private variables that are subject to a reduction operation at the end of the parallel region. OpenMP predefines a set of reduction operators. Each reduction variable must be a scalar (for example, int , long , and float ).
A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program. One reason for this myth to persist, are examples that compare raw arrays with mis-used std::vectors.
Here is a rule of thumb: If you want to add elements to your container or remove elements from your container, use a std::vector; if not, use a std::array.
1) std::vector is a sequence container that encapsulates dynamic size arrays. 2) std::pmr::vector is an alias template that uses a polymorphic allocator. The elements are stored contiguously, which means that elements can be accessed not only through iterators, but also using offsets to regular pointers to elements.
It is fairly straight forward to do a user declared reduction for C++ vectors of a specific type:
#include <algorithm>
#include <vector>
#pragma omp declare reduction(vec_float_plus : std::vector<float> : \
std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<float>())) \
initializer(omp_priv = decltype(omp_orig)(omp_orig.size()))
std::vector<float> res(n,0);
#pragma omp parallel for reduction(vec_float_plus : res)
for(size_t i=0; i<m; i++){
res[...] += ...;
}
1a) Not knowing m
at compile time is not a requirement.
1b) You cannot use the array section reduction on std::vector
s, because they are not arrays (and std::vector::data
is not an identifier). If it were possible, you'd have to use n
, as this is the number of elements in the array section.
2) As long as you are only reading indexes
and vals
, there is no issue.
Edit: The original initializer
caluse was simpler: initializer(omp_priv = omp_orig)
. However, if the original copy is then not full of zeroes, the result will be wrong. Therefore, I suggest the more complicated initializer which always creates zero-element vectors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With