Optimize performance of loop

I've been profiling a bottleneck in my code (a function shown below) that gets called several million times. I could use tips on increasing the performance. The XXXs numbers were taken from Sleepy.

Compiled with visual studio 2013, /O2 and other typical release settings.

indicies is typically 0 to 20 values, and other parameters are the same size (b.size() == indicies.size() == temps.size() == temps[k].size()).

1:          double Object::gradient(const size_t j, 
2:                                  const std::vector<double>& b, 
3:                                  const std::vector<size_t>& indices, 
4:                                  const std::vector<std::vector<double>>& temps) const
5:  23.27s  {
6:              double sum = 0;
7:  192.16s     for (size_t k : indices)
8:  32.05s          if (k != j)
9:  219.53s             sum += temps[k][j]*b[k];
10:      
11: 320.21s     return boost::math::isfinite(sum) ? sum : 0;
13: 22.86s  }

Any ideas?

Thanks for the tips guys. Here were the results I got from the suggestions:

enter image description here

I found it interesting that switching to cbegin() and cend() had such a large impact. I guess the compiler isn't being as smart as it could there. I'm happy with the bump, but still curious if there's more room here through unrolling or vectorization.

For those interested, here my benchmark for isfinite(x):

boost::isfinite(x):
------------------------
SPEED: 761.164 per ms
TIME:  0.001314 ms
   +/- 0.000023 ms

std::isfinite(x):
------------------------
SPEED: 266.835 per ms
TIME:  0.003748 ms
   +/- 0.000065 ms

How do you optimize a loop in Python?

We can optimize loops by vectorizing operations. This is one/two orders of magnitude faster than their pure Python equivalents (especially in numerical computations). Vectorization is something we can get with NumPy. Numpy is a library with efficient data structures designed to hold matrix data.

Which for loop is faster?

The for loop is the fastest but poorly readable. The foreach is fast, and iteration is controllable. The for…of takes time, but it's sweeter. The for…in takes time, hence less convenient.

Is reduce faster than for loop?

Both for and for..of are 3.5 times faster than reduce .

If you know that the conditional will be met (that in every iteration you will meet k == j), eliminate the conditional and replace the return condition with a simple conditional store.

double sum = -(temps[j][j]*b[j]);
for (size_t k : indices)
     sum += temps[k][j]*b[k];
if (!std::isfinite(sum))
     sum = 0.0;
return sum;

Range-based for is still new enough to not always get great optimization. You may also want to try:

const auto it = cend(indices);
for (auto it = cbegin(indices); it != end; ++it) {
    sum += temps[*it][j]*b[*it];
}

and see if the perf varies.

There are two points that stick out:

(a) boost::math::isfinite is taking relatively long. If possible, try to establish with other ways that your results are within the valid range.

(b) storing a 2D array as a nested vector is not the fasted way. It would probably be faster to make temp a 1D array and pass the row size as a parameter. For the access to a an element you will have to do the index calculation yourself. But as one of the indices (j) is constant over the loop, you can compute the start element's index before the loop and inside just increment the index 1. That might get you some significant improvement.

Optimize performance of loop

Tags:

c++

performance

c++11

Inverse

People also ask

2 Answers

kfsone

nv3

Recent Activity

Donate For Us

Optimize performance of loop

Tags:

c++

performance

c++11

Inverse

People also ask

2 Answers

kfsone

nv3

Related questions

Recent Activity

Donate For Us