I have an algorithm for which one goal is to fill vectors. For the sake of performance, iterations of the algorithm are spread across OpenMP threads. I was wondering which way would provide the better/safer way of filling the vectors.
Note that the ordering of the vectors must be consistent (i.e. value n of vec1 must come from the same iteration than value n of vec2.)
Hypothesis 1:
std::vector<BasicType> vec1;
std::vector<BasicType> vec2;
#pragma opm parallel for
for(...)
{
// Do some intensive stuff to compute val1 and val2
// ...
#pragma omp critical
{
vec1.push_back(val1);
vec2.push_back(val2);
}
}
// Then go on to work with vec1 and vec2...
Hypothesis 2:
std::vector<BasicType> vec1;
std::vector<BasicType> vec2;
#pragma opm parallel
{
std::vector<BasicType> vec1local;
std::vector<BasicType> vec2local;
#pragma omp for
for(...)
{
// Do some intensive stuff to compute val1 and val2
// ...
vec1local.push_back(val1);
vec2local.push_back(val2);
}
#pragma omp critical
{
// See note 1 below
vec1.insert(vec1.end(), vec1local.begin(), vec1local.end());
vec2.insert(vec2.end(), vec2local.begin(), vec2local.end());
}
}
// Then go on to work with vec1 and vec2...
Note 1: This was taken from Best way to append vector to vector
Note 2: It seems val1 and val2 could be combined in some object in order to maintain consistency and work only with one vector, but for now it seems inpractical for the remainder of the algorithm.
Note 3: To give an order of magnitude, the for
loop contains around 100 iterations splitted among 4 threads. Apart for very few exceptions, each iteration is expected to have the same workload (which brings the issue of the critical sections happening around the same time.)
Note 4: Just for the records, the whole thing deals with image stabilisation, and runs on a Tegra K1 architecture. Compiler used is gcc 4.8.4.
From your two suggestions, I would prefer the first. It is simpler and does not require additional memory. However, I would suggest an alternative without the critical
section:
std::vector<BasicType> vec1(size);
std::vector<BasicType> vec2(size);
#pragma opm parallel for
for(...)
{
// Do some intensive stuff to compute val1 and val2
// ...
vec1[i] = val1;
vec2[i] = val2;
}
Note that this may have some performance issues due to cache line stealing. However, I wouldn't worry about that unless it turns out to be an actual problem validated by actual performance analysis. A solution could then be:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With