Apologies if this is asked in the archives. I found some similar questions, but none that seemed exactly what I wanted.
The distilled version of the problem I'm working on is as follows. I have a series of calculations to perform that will store values in 4 (very large) arrays: A,B,C and D. These calculations are interdependent, for example calculating b[i] could require using a[i-1]. I am capable of expressing everything in a single loop, but that results in edge cases where for certain values of i, only some of the calculations should be performed. For example:
for(i=0;i<end;i++)
{
if(i == 0)
//calculate A[i+1] and B[i+1]
else if (i == end-1)
//calculate C[i-1] and D[i-1]
else
//calculate A[i+1], B[i+1], C[i-1], D[i-1]
}
For issues of performance, I would like to avoid having conditionals within my loop. Evaluating a conditional would be cheap compared to the calculations, but possibly not negligible. My question is if a compiler might reliably expand that to
//calculate A[1] and B[1]
for(i=1;i<end-1;i++)
{
//calculate A[i+1], B[i+1], C[i-1], D[i-1]
}
//calculate C[end-2] and D[end-2]
I gather from the archives that the compiler would break apart my loop if the conditional expressions were constant, but here they depend on i, which in principle could be changed by some of my calculations. Will it detect that I'm not tampering with the iteration variable and thus break it apart in the sensible way?
Extra information, in case you decide to answer the question by suggesting a better way to do things:
Originally the code was written with 4 loops, to calculate elements for each of the arrays. This was the most intuitive way to write the code, but it was inefficient. Since calculating elements in one array depended on elements in the other arrays, this meant I had to read in all 4 arrays from memory during each of the 4 loops. Since these arrays do not fit in cache, this is not optimal and I needed code that would loop through my arrays only once.
I'm also aware that I can break my loop apart by hand, and indeed that is how things are currently done. However these calculations involve nontrivial formulas (and I can't afford the performance hit of calling a function during every iteration of this loop), so breaking apart the code caused code duplication that is not only very hard to read, but almost unmaintainable the next time my formulas get tweaked (which they will...)
Thanks in advance!
To answer your question in a broader sense: when optimization is critical, the profiler is your friend. Developers are notoriously poor at guessing where in our code the processor spends most of its time. A profiler will show you exactly where the "expensive" operations are, so you can concentrate on fixing the areas that will give you the most significant improvements.
I'm curious about your statement that you "can't afford the performance hit of calling a function during every iteration of this loop...." How do you know that? Many modern processors are optimized for function calls, particularly if you can pass a pointer (to a struct
?) instead of many individual arguments. If your calculations are indeed "nontrivial" then the overhead of a function call may be negligible.
Other things to think about:
As an experiment, re-index your calculations so they operate exactly on i
itself, rather than i-1
or i+1
, as much as possible. So, for example, use A[i]
, B[i]
, C[i-2]
, and D[i-2]
. I'd be surprised if that made a significant improvement using an optimizing compiler, but you never know....
Precompute anything you can.
Try to break your calculations into smaller components that are either constant or common, as James Greenhalgh suggested, so they might be reused.
Can you rewrite your equations more efficiently? Analysis of the math may lead you to a shortcut: perhaps you can rewrite some (or all) of the iterations in closed form.
Can you replace your equations altogether with something simpler? For example, suppose you need to sort a set of locations by their distance from your house. The distance calculation requires subtraction, squaring, addition, and a square root. In general the square root is by far the most expensive operation. But if you need only the relative distances, you can skip the square root altogether: sorting by the square of the distance generates the same sort order!
If inlining isn't possible, can you define your functions (or their components) as efficient macros, so at least you can avoid repeating the code? As you mentioned, clipboard inheritance is the mortal enemy of maintainability.
If nothing else, going through this exercise will teach you oodles about the way your compiler and the C language work. Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With