Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to optimize nested loops

Is there an algorithm available to optimize the performance of the following?

for (i = 0; i < LIMIT; i++) {
  for (j = 0; j < LIMIT; j++) {
   // do something with i and j
  }
 }
  • Both i and j start at 0
  • Both loops end on the same condition
  • Both i and j are incremented at the same rate

Can this be done in 1 loop somehow?

like image 909
James Raitsev Avatar asked Sep 17 '11 21:09

James Raitsev


People also ask

What are the techniques used for loop optimization?

For loop optimization the following three techniques are important: Code motion. Induction-variable elimination. Strength reduction.

What is nested loop in algorithm?

Nested-Loop Join Algorithm A simple nested-loop join (NLJ) algorithm reads rows from the first table in a loop one at a time, passing each row to a nested loop that processes the next table in the join. This process is repeated as many times as there remain tables to be joined.

How do you speed up a loop in C++?

For the access to a an element you will have to do the index calculation yourself. But as one of the indices (j) is constant over the loop, you can compute the start element's index before the loop and inside just increment the index 1. That might get you some significant improvement.


3 Answers

It is possible to write this using one loop, but I would strongly suggest not doing so. The double for-loop is a well-established idiom that programmers know how to read, and if you collapse the two loops into one you sacrifice readability. Moreover, it's unclear if this will actually make the code run any faster, since the compiler is already very good at optimizing loops. Collapsing the two loops into one requires some extra math at each step that is almost certainly slower than the two loops independently.

That said, if you do want to write this as a single loop, one idea is to think about the iteration space, the set of pairs that you iterate over. Right now, that looks like this:

(0, 0)   (0, 1),   (0, 2), ...,   (0, N-1)
(1, 0)   (1, 1),   (1, 2), ...,   (1, N-1)
                ...          
(N-1, 0) (N-1, 1), (N-1, 2), ..., (N-1, N-1)

The idea is to try to visit all of these pairs in the order (0, 0), (0, 1), ..., (0, N-1), (1, 0), (1, 1), ..., (1, N-1), ..., (N-1, 0), (N-1, 1), ..., (N-1, N-1). To do this, note that every time we increment i, we skip over N elements, while when we increment j we skip over just one element. Consequently, iteration (i, j) of the loop will map to position i * N + j in the linearized loop ordering. This means that on iteration i * N + j, we want to visit (i, j). To do this, we can recover i and j from the index using some simple arithmetic. If k is the current loop counter, we want to visit

i = k / N   (integer division)
j = k % N

Thus the loop can be written as

for (int k = 0; k < N * N; ++k) {
    int i = k / N;
    int j = k % N;
}

However, you have to be careful with this because N * N might not fit into an integer and thus could overflow. In that case, you would want to fall back on the double for-loop. Moreover, the introduction of the extra divisions and moduli will make this code run (potentially) much slower than the double for-loop. Finally, this code is much harder to read than the original code, and you'd need to be sure to provide aggressive comments describing what it is that you're doing here. Again, I strongly advise you not to do this at all unless you have a very strong reason to suspect that there is a problem with the standard double for-loop.

(Interestingly, the trick used here can also be used to represent a multidimensional array using a single-dimensional array. The logic is identical - you have a two-dimensional structure that you want to represent with a one-dimensional structure.)

Hope this helps!

like image 64
templatetypedef Avatar answered Oct 29 '22 15:10

templatetypedef


You can't improve the big-O performance of the loop. However, there are algorithm-dependent methods of improving the constant factor hidden by the big-O by taking advantage of the cache.

Here is an example of an improved matrix transpose algorithm: A Cache Efficient Matrix Transpose Program?

However the common theme here is that we actually introduce more loops, rather than fewer.

like image 45
tskuzzy Avatar answered Oct 29 '22 16:10

tskuzzy


There's no way to significantly optimize the loop itself. However, when you consider the details of "do something with i and j", it can make a big difference whether i or j is the outer loop. For example, one order may cause jumping around a lot in memory or disk while the other order results in sequential access, or nearly so.

Also, you can optimize a double loop sometimes by moving computations that don't rely on the inner index from the inner to the outer loop, maybe with a temporary variable. Smart compilers may optimize this to a point, but they're not perfect.

like image 20
xpda Avatar answered Oct 29 '22 17:10

xpda