Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should we still be optimizing "in the small"?

I was changing my for loop to increment using ++i instead of i++ and got to thinking, is this really necessary anymore? Surely today's compilers do this optimization on their own.

In this article, http://leto.net/docs/C-optimization.php, from 1997 Michael Lee goes into other optimizations such as inlining, loop unrolling, loop jamming, loop inversion, strength reduction, and many others. Are these still relevant?

What low level code optimizations should we be doing, and what optimizations can we safely ignore?

Edit: This has nothing to do with premature optimization. The decision to optimize has already been made. Now the question is what is the most effective way to do it.

anecdote: I once reviewed a requirements spec that stated: "The programmer shall left shift by one instead of multiplying by 2".

like image 752
Mark Beckwith Avatar asked Apr 18 '09 15:04

Mark Beckwith


People also ask

Should I optimize code?

Writing optimized code is valid and required for any software project to achieve good software quality and stability. Writing optimized code is a good software engineering practice and is heavily required for big tech companies' codebases.

Who said premature optimization is the root of all evil?

Premature optimization is spending a lot of time on something that you may not actually need. “Premature optimization is the root of all evil” is a famous saying among software developers. Its source is credited to Donald Knuth.

Why do we need to optimize our program?

When to optimize. Optimization can reduce readability and add code that is used only to improve the performance. This may complicate programs or systems, making them harder to maintain and debug. As a result, optimization or performance tuning is often performed at the end of the development stage.


2 Answers

This is a well-worn subject, and SO contains oodles of good and bad advice on it.

Let me just tell you what I have found from much experience doing performance tuning.

There are tradeoff curves between performance and other things like memory and clarity, right? And you would expect that to get better performance you would have to give something up, right?

That is only true if the program is on the tradeoff curve. Most software, as first written, is miles away from the tradeoff curve. Most of the time, it is irrelevant and ignorant to talk about giving up one thing to get another.

The method I use is not measurement, it is diagnosis. I don't care how fast various routines are or how often they are called. I want to know precisely which instructions are causing slowness, and why.

The dominant and primary cause of poor performance in good-sized software efforts (not little one-person projects) is galloping generality. Too many layers of abstraction are employed, each of which extracts a performance penalty. This is not usually a problem - until it is a problem - and then it's a killer.

So what I do is tackle one issue at a time. I call these "slugs", short for "slowness bugs". Each slug that I remove yields a speedup of anywhere from 1.1x to 10x, depending on how bad it is. Every slug that is removed makes the remaining ones take a larger fraction of the remaining time, so they become easier to find. In this way, all the "low-hanging fruit" can be quickly disposed of.

At that point, I know what is costing the time, but the fixes may be more difficult, such as partial redesign of the software, possibly by removing extraneous data structure or using code generation. If it is possible to do this, that can set off a new round of slug-removal until the program is many times not only faster than it was to begin with, but smaller and clearer as well.

I recommend getting experience like this for yourself, because then when you design software you will know what not to do, and you will make better (and simpler) designs to begin with. At the same time, you will find yourself at odds with less experienced colleagues who can't begin thinking about a design without conjuring up a dozen classes.

ADDED: Now, to try to answer your question, low-level optimization should be done when diagnosis says you have a hot-spot (i.e. some code at the bottom of the call stack appears on enough call stack samples (10% or more) to be known to be costing significant time). AND if the hot-spot is in code you can edit. If you've got a hot-spot in "new", "delete", or string-compare, look higher up the stack for things to get rid of.

Hope that helps.

like image 102
Mike Dunlavey Avatar answered Nov 10 '22 03:11

Mike Dunlavey


If there is no cost to the optimization, do it. When writing the code, ++i is just as easy to write as i++, so prefer the former. There is no cost to it.

On the other hand, going back and making this change afterwards takes time, and it most likely won't make a noticeable difference, so you probably shouldn't bother with it.

But yes, it can make a difference. On built-in types, probably not, but for complex classes, the compiler is unlikely to be able to optimize it away. The reason for this is that the increment operation no is no longer an intrinsic operation, built into the compiler, but a function defined in the class. The compiler may be able to optimize it like any other function, but it can not, in general, assume that pre-increment can be used instead of post-increment. The two functions may do entirely different things.

So when determining which optimizations can be done by the compiler, consider whether it has enough information to perform it. In this case, the compiler doesn't know that post-increment and pre-increment perform the same modifications to the object, so it can not assume that one can be replaced with the other. But you have this knowledge, so you can safely perform the optimization.

Many of the others you mention can usually be done very efficiently by the compiler: Inlining can be done by the compiler, and it's usually better at it than you. All it needs to know is how large a proportion of the function consists of function call over head, and how often is it called? A big function that is called often probably shouldn't be inlined, because you end up copying a lot of code, resulting in a larger executable, and more instruction cache misses. Inlining is always a tradeoff, and often, the compiler is better at weighing all the factors than you.

Loop unrolling is a purely mechanic operation, and the compiler can do that easily. Same goes for strength reduction. Swapping inner and outer loops is trickier, because the compiler has to prove that the changed order of traversal won't affect the result, which is difficult to do automatically. So here is an optimization you should do yourself.

But even in the simple ones that the compiler is able to do, you sometimes have information your compiler doesn't. If you know that a function is going to be called extremely often, even if it's only called from one place, it may be worth checking whether the compiler automatically inlines it, and do it manually if not.

Sometimes you may know more about a loop than the compiler as well (for example, that the number of iterations will always be a multiple of 4, so you can safely unroll it 4 times). The compiler may not have this information, so if it were to inline the loop, it would have to insert an epilog to ensure that the last few iterations get performed correctly.

So such "small-scale" optimizations can still be necessary, if 1) you actually need the performance, and 2) you have information that the compiler doesn't.

You can't outperform the compiler on purely mechanical optimizations. But you may be able to make assumptions that the compiler can't, and that is when you're able to optimize better than the compiler.

like image 26
jalf Avatar answered Nov 10 '22 03:11

jalf