Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get better at optimization?

In advance apologize if the question seems somewhat broad or strange, I don't mean to offend anyone, but maybe someone can actually make a recommendation. I tried looking for the similar questions, but cold not.

Which are the better resources (books, blogs etc.) that can teach about optimizing code?

There is quite a few resources on making code more human-readable (Code Complete being number one choice probably). But what about making it run faster, more memory-efficient?

Of course there are lots of books on each particular language, but I wonder if there are some that cover the problems of memory / speed of operations and are somewhat language-independent?

like image 893
Stpn Avatar asked Dec 26 '22 19:12

Stpn


2 Answers

Here are some links that might be helpful in general on the subject of memory optimizations

What Every Programmer Should Know About Memory by Ulrich Drepper

Herb Sutter: The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

Slides: Herb Sutter: Machine Architecture (Things Your Programming Language Never Told You)

Video: Herb Sutter @ NWCPP: Machine Architecture: Things Your Programming Language Never Told You

The microarchitecture of Intel, AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers, by Agner Fog

like image 103
amdn Avatar answered Jan 08 '23 05:01

amdn


Read Structured Programming with go to Statements. While it's the source of the "premature optimisation is the source of all evil" quote that comes up the moment somebody wants to make anything faster or smaller - no matter how desperately important or late in the process they are - it's actually about the importance of making things efficient when you can.

Learn about time complexity, space complexity and the analysis of algorithms.

Come up with examples where you would want to sacrifice having worse space complexity for better time complexity, and vice versa.

Know the time and space complexities of the algorithms and data structures your languages and frameworks of choice offer, especially those you use most often.

Read the answers on this site on questions about creating a good hash code.

Study the approach HTTP took to having the advantage of caching, without the disadvantage of using stale data inappropriately. Consider how easy or difficult that is to apply to in-memory caches. Consider when you would say "screw it, I can live with being stale for the speed boost it gives me". Consider when you would say "screw it, I can live with being slow for the guarantee of freshness it gives me".

Learn how to multithread. Learn when it improves performance. Learn why it often doesn't or even makes things worse.

Look at a lot of Joe Duffy's blog where performance is a regular concern of his writing.

Learn how to process items as streams or iterations rather than building and rebuilding data-structures full of each item, each time. Learn when you're actually better off not doing that.

Know what things cost. You can't reasonably decide "I'll work so this is in the CPU cache rather than main-memory/main-memory rather than disk/disk rather than over a network" unless you've a good idea what actually causes each to be hit, and what the cost differences are. Worse, you can't dismiss something as premature optimisation if you don't know what they cost - not bothering to optimise something is often the best choice, but if you don't even consider it in passing you aren't "avoiding premature optimisation", you're muddling through and hoping it works.

Learn a bit about what optimisations are done for you by the script engine/jitter/compiler/etc you use. Learn how to work with them rather than against them. Learn not to re-do work it'll do for you anyway. In one or two cases, you may also be able to apply the same general principle to your work.

Search for cases on this site where something is dismissed as an implementation detail - yes, all of those are cases where the detail in question isn't the most important thing at the time, but all of those implementation details were chosen for a reason. Learn what they were. Learn the counter-arguments.

Edit (I'll keep adding a few more to this as I go):

Different books of course differ in the emphasis they put on efficiency concerns, but I remember Stroustrup's The C++ Programming Language as one where there were a good few times where he will explain a choice between a few different options as relating to efficiency, and also on how to not have decisions made for efficiency's sake impact on the usability of the classes "from the outside".

Which brings me to another point. Concentrate on the efficiency of the library code you reuse in different projects. You don't want to ever be thinking "maybe I should hand-roll a new one here to be more efficient", unless it's a very specialised case, you want to be confident that lots of work went into making that heavily used class efficient over a lot of case, and concentrate on identifying hot-spots.

As for specialised cases, some of the more obscure data structures are worth knowing for the cases they serve. For example, a DAWG is a very compact structure for storing strings with a lot of common prefixes and suffixes (which would be most words in most natural languages) where you just want to find those in the list that match a pattern. If you need a "payload" then a tree where each letter has a list of nodes for each subsequent letter (a generalisation of a DAWG but ending in that "payload" rather than the terminal node) has some but not all of the advantages. They also find the result in O(n) time where n is the length of the string sought.

How often will that come up? Not many. It came up for me once (a few times really, but they were variants of the same case), and as such it would not have been worth it for me to learn all there was to know about DAWGs until then. But I knew enough to know it was what I needed to research later, and it saved me gigabytes (really, from way too much for a machine with 16GB RAM to cope with, to less than 1.5GB). Going straight for a hand-rolled DAWG would totally be premature optimisation rather than putting the strings in a hashset, but flicking through the NIST datastructure site meant I could when it came up.

Consider: "Finding a string in a DAWG is O(n)" "Finding a string in a Hashset is O(1)" Both of these statements is true, but the speed of the two tends to be comparable. Why? Because the DAWG is O(n) in terms of the length of the string, and effectively O(1) in terms of the size of the DAWG. The Hashset is O(1) in terms of the size of the hashset, but working out the hash is typically O(n) in terms of the length of the string, and equality checks are also O(n) in terms of that length. Both statements were correct, but they were thinking about a different n! You always need to know what n means in any discussion of time and space complexity - most often it'll be the size of the structure, but not always.

Don't forget constant effects: O(n²) is the same as O(1) for sufficiently low values of n! Remember that the likes of O(n²) translates as n²*k + n * k₁ + k₂, with the assumption that k₁ & k₂ are low enough and k and the k of another algorithm or structure we are comparing of are close enough, that they don't really matter and it's only n² that we care about. This isn't true all the time, and we can sometimes find that k, k₁ or k₂ are high enough that we end up in trouble. It's also not true when n is going to be so small as to make the difference in the constant costs of different approaches matter. Of course normally when n is small we don't have a big efficiency concern, but what if we are doing m operations on structures averaging n in size, and m is large. If we are choosing between an O(1) and a O(n²) approach, we are choosing between an O(m) and O(n²m) approach overall. It still seems like a no-brainer in favour of the former, but with a low n it essentially becomes a choice between two different O(m) approaches, and the constant factors are much more important.

Learn about lock-free multi-threading. Or perhaps don't. Personally, I've two pieces of my own code I use professionally that use all but the simplest lock-free techniques. One is based on well-known approaches and I wouldn't bother now (it's .NET code first written for .NET2.0 and the .NET4.0 library supplies a class that does the same thing). The other I first wrote for fun, and only used after that just-for-fun period had given me something reliable (and it still gets beaten by something in the 4.0 library for a lot of cases, but not for some others that I care about). I would hate to have to write something like it with a deadline and a client in mind.

All that said, if you're coding out of interest, the challenges involved are interesting and it's an enjoyable thing to work with when you've the freedom to give up on a failed plan that you don't get when you're doing something for a paying client, and you'll certainly learn a lot about efficiency concerns generally. (Take a look at https://github.com/hackcraft/Ariadne if you want to see some of what I've done with this).

A Case Study

Actually, that contains a relatively good example of some of the above principles. Take a look at the method that's currently at line 511 at https://github.com/hackcraft/Ariadne/blob/master/Collections/ThreadSafeDictionary.cs (where I joke in the comments about it being flame-bait for people quoting Dijkstra. Let's use it as a case-study:

This method was first written to use recursion, because it's a naturally recursive problem - after doing the operation on the current table, if there's a "next" table we want to do the exact same operation on that, and so on until there's no further table.

Recursion is almost always slower than iteration, for a few different methods. Should we make all recursive calls iterative? No, it's often not worth it, and recursion is a wonderful way to write code that is clear about what it's doing. Here though I apply the principle above that since this is a library that might be called where performance is crucial, particular effort should be extended on it.

The decision to try to improve its speed being made, the next thing I did was make measurements. I don't depend on "I know that iteration is faster than recursion, so it must be faster when changed to avoid recursion". That's just not true - a poorly written iterative version may not be as good as a well-written recursive version.

The next question is, just how to re-write it. I've a tested method that I know works and I'm going to replace it with a different version. I don't want to replace it with a version that doesn't work, obviously, so how to re-write while taking the most advantage out of what's already there?

Well, I know about tail-call elimination; an optimisation normally done by compilers that changes the way the stack is managed so that recursive functions end up with properties closer to those of iterative (it's still recursive from the perspective of the source code, but it's iterative in terms of how the compiled code actually uses the stack).

This gives me two things to think about: 1. Maybe the compiler is already doing this, in which case my extra work isn't going to do anything to help. 2. If the compiler isn't already doing this, I can take the same basic approach manually.

That decision made, I replaced all of the points where the method called itself, with a change to the one parameter that would be different for that next call, and then go back to the beginning. I.e. instead of having:

CurrentMethod(param0.next, param1, param2, /*...*/);

We have:

param0 = param0.next;
goto startOfMethod;

That being done, I measure again. Running through the entire unit tests for the class is now consistently 13% faster than before. If it were closer I'd have tried more detail measurements, but a consistent 13% on runs that includes code that doesn't even call this method is something I'm pretty happy with. (It also tells me that the compiler wasn't doing the same optimisation, or I wouldn't have gained anything).

Then I clean up the method to make more changes that make sense with the new code. Most of them let me take out the goto because goto is indeed nasty (and there's other places the same optimisation was done that aren't as obvious because the goto was refactored entirely). In some, I left it in, because 13% is worth breaking the no-goto rule to my mind!

So the above gives an example of:

  1. Deciding where to concentrate optimisation effort (based on how often it might be hit and my inability to predict all uses of the library)
  2. Using knowledge of general costs (recursion costs more than iteration, most of the time).
  3. Measuring rather than depending on assuming the above always applies.
  4. Learning from what compilers do.
  5. Understanding that because of that I may not gain anything - maybe the compiler already did it for me.
  6. Avoiding optimisations leading to unreadable code (refactoring out most of the gotos the first pass introduced).

Some of these are matters of opinion and style (the decision to leave in some goto would not be without controversy), and it's certainly okay to disagree with my decisions, but knowledge of the points raised so far in this post would make it an informed disagreement, rather than a knee-jerk one.

like image 22
Jon Hanna Avatar answered Jan 08 '23 05:01

Jon Hanna