Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed comparison: add zero or check non-zero first

I'm optimizing the most time-consuming loop in a program I'm writing that sums many entries in an array, many of which will be zero. Is if faster to check if an entry is zero before adding it, or skip the check and add all entries? Examples of each below. This is in C++. Thanks!

double *arr, sum=0;
...
for (int i = 0; i < n; i++)
    sum += arr[i];

OR

double *arr, sum=0;
...
for (int i = 0; i < n; i++)
    if (arr[i])
        sum += arr[i];
like image 947
wcroughan Avatar asked Oct 27 '25 21:10

wcroughan


2 Answers

Quote of the day:

Premature optimization is the root of all evil
- Donald Knuth

If your intent is to add all the elements of an array, then write the code that does exactly this and let the compiler's optimizer take care of what's best. So go for the first alternative; your future you will be thankful one day.

Don't do manual optimizations if not absolutely necessary:

With modern CPU, it's anyway difficult to think of all the possible effects of cache management, cache optimisation, jump prediction, and other hardware tricks. The compiler's optimizer can combine much more factors than we can.

If you really notice some performance issues, then profile your code, and concentrate your efforts on optimizations that really matter. Alternatively, you could benchmark the code on your target platform, but beware of subtle differences in the benchmark, that might affect the optimizer.

Now, this being said, the second option requires a compare instruction (ucomisd on x86) for every item in the array. So if most of the items have a non zero value, you mostly add an unnecessary overhead. For null items you'd have exchanged a simple add with two instructions, a compare and a conditional branch. I'm not sure if this is really faster, but if there would be any benefit, it would very probably be extremely marginal. So in the best case you achieve a very marginal gain, but most probably you'll add some overhead. So intuitively, stick to the first alternative, unless your profiler tells you there's a problem.

like image 63
Christophe Avatar answered Oct 30 '25 13:10

Christophe


If you are running on Intel architecture, there is one way you could speed this up, but it's not pretty: you use the REPZ SCASD instruction to scan your array for the next non-zero element. You will need to program this in assembly language, of course. And it relies on most of the zero elements of the array being represented as 0x0000000000000000, which although probably true is not guaranteed.

If I were implementing this, I would write a C-callable function in assembly language:

size_t NextNonZeroArrayElement (double* arr, size_t len)

This will only be worth it if most of the elements are zero (not just if many of them are zero). But in any case it's a fun project if you have the time for it.

If you are really enthusiastic, you might consider writing the whole thing in assembly language, complete with floating-point operations. Then I think you would come out ahead at a much lower proportion of zero elements.

like image 32
TonyK Avatar answered Oct 30 '25 13:10

TonyK



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!