Ideas for good performance optimisations in C++

Question

Ok, I've been sitting in front of profiler results over the last three days, generated by running a pretty wide range of test cases through an automation suite. The idea is to see if there are any good optimisations that can generally improve performance. I'll qualify good in this context as follows;

Has the potential for performance improvement that are both very significant and observable at end user level, e.g. > 100% improvement in an underperforming area.
Has the potential for core space usage reduction e.g. > 50% reduction in a data heavy area.
Is easy to implement, with minimal obfuscation to the code, and minimal side effects. i.e. the benefits of implementing the optimisation greatly outweight the costs.

The application is a 3d mapping and modelling package with plenty of graphics in the interface and geometric processing at the back end. I've already done a lot on ensuring optimal algorithm selection for most processing, and at this stage I'm looking for any generally applicable easy ways to get that extra bit of punch when dealing with large and complex data sets. So far, I've come up with the following;

When searching, keep a buffer of the last recently found items and check that first. A heck of a lot of processing with repetetive searches seem to search around the same area. From answers to date this appears to be a specific form of memoization
When sorting, check the data isn't already in sort order (specifically where qsort has been used)
Keep the GUI and processing in seperate threads (Fails the good criteria of being easy to implement, but IMO still worthwhile)
Where you have local class variables, that have significant construction / destruction times, in heavily used member functions, make them private class members. Notably true of dynamic arrays and strings, especially MFC CArrays and CStrings.
When using dynamic arrays, set the initial size to slightly exceed typical usage, and have an exponential growth strategy.
When dealing with very large data sets to be stored in arrays, size up the data first to avoid any reallocs.
Avoid function returns that create temporary object copies on the stack, use reference parameters instead, e.g.

CString MyFunc(double x, double y)

is less efficient than

void  MyFunc(double x, double y, CString &Result)

actually, avoid CStrings and most of MFC in any performance critical area of the code. (Edit: this may be more generally negated by RVO, although not for CStrings in my app)

These are items that seem to work well in my context, but are there any obvious ones that I've left out, or are there any other good resources out there on optimisation?

Edit: Based on many of the comments provided, some further explanation is clearly required. While I fully realise that suggesting specific optimisations for a specific piece of code requires sight of that code, the past couple of days spent analysing profiler output have shown up certain patterns in terms of optimisation candidates. I'm also aware of my own ignorance in respect of what others are doing to good effect in the field, and see value (to me at least) in having an enumerated list of such techniques, regardless of whether they apply to my situation. This question is not about the dangers of optimization, but for any beginners out there, I'd recommend you first establish a strong requirement for the need to optimize prior to considering in the first place. My own preference is to carry out most optimization at design stage based on performance requirements going forward, but I'm also a strong advocate of profiling to verify design assumptions have been met in implementation. I would ask people to please limit their answers to their own experience of positive optimizations rather than on their beliefs as to whether we should consider optimization in the first instance.

FWIW, the difference between having the compiler optimise the code or not comes out at 12% across my automation suite, which is borderline observable at end user level.

Second edit: Some related posts I found very beneficial, particularly Mike Dunlavey's comments on becoming too dependant on profiler output.

Performance optimization strategies of last resort?

What can I use to profile C++ code in Linux?

dirkgently · Accepted Answer

please don't throw up any of the old 'optimisation is the root of all evil' stuff, it's totally irrelevent to this question

Yes, and then you have things like:

When sorting, check the data isn't already in sort order

makes me wonder if you're using an efficient algorithm or not. Which, is the basic premise that "premature optimization is the root of all evil" states.

and will be downvoted.

Really? That tone is no good. IMO. YMMV.

Similarly, I'm not interested in the joys of letting the compiler optimise it for me, or throwing inlines all over the place. FWIW, the difference between having the compiler optimise the code or not comes out at 12% across my automation suite, which is borderline observable at end user level.

Coupled with any optimization you hand-instrument, you'd still want the compiler optimizations on.

Apart from that, since you provide no particular insights on where your bottlenecks are, it is difficult if not impossible to provide any pointers. I'd hazard a guess that you at the least:

Play around with custom allocators
Explore the possibility of using vector instructions if your target machine is a vector one

Edit: Since you say you were not aware of RVO: try reading up on move semantics and particularly this library: move library from Adobe. I guess Boost will have something similar.

Edit#2: Two more things:

Fixed point arithmetic may be a good thing to look up
Use Look-up tables as much as you can

philsquared · Answer

CString MyFunc(double x, double y)

is less efficient than

void MyFunc(double x, double y, CString &Result)

If MyFunc is written cleanly they should be about the same. The compiler should be able to take advantage of NRVO. It sounds like you've profiled and found that its not - I'm just saying it may more fit your criteria, e.g. "minimal obfuscation to the code", to rearrange the function itself a little to allow NRVO to occur.

A few more things to try:

memoization (similar to your caching of repeated searches, but more focussed on tree/ graph resolution, if you have any).
Using floats instead of doubles, if you don't need the extra precision (or maybe even ints if you can).
Use types to mark assumptions (you mentioned sorted arrays - another common one is lowercased strings). Create a derived or wrapper type (e.g. Sorted<T>) that make such assumptions explicit. That way if you have a method that takes a Sorted<vector<T> >, for example, and you give it a sorted vector, it passes straight through - but if you give it a vector<T>, it will have to constructed a Sorted<vector<T> >, at which point it will sort it. You can manually assert that it is sorted using an alternative constructor, but it makes it much easier to carry your assumptions around and maybe catch places you might have otherwise missed.
Don't give up on inlines etc. Make sure you're fully aware of when they should help and when they should hinder. In some cases they really can make a difference - but probably not if you just deal with them arbitrarily.
You might benefit from flyweight, or pooled object allocators.
When threading, try to minimise any interactions so you can reduce the amount of code that requires locking. Often taking copies of even fairly expensive objects can be less of an overhead than a mutex. Obvious take advantage of atomic types where you can.

Charles Eli Cheese · Answer

I think your requirements are pretty much mutually exclusive unless there's an obvious flaw of some kind (which is all profiling is really good for finding).

Things that really change performance require a lot of effort, and your basic data structures are the most important thing. Reducing memory fragm., aligned memory management, SIMD, data structures that are small as possible and allocated all in one block as much as possible, multithreading, reducing code size from templates, redeclaring parameters as local variables so the optimizer can know they are same to optimize. None of those can be tacked on at the end without a lot of cost.

Many times you can't even easily measure the things that really affect performance because they only become costly as the program runs or as its code size grows.

High Performance Mark · Answer

I fear large and complex data structures. I like large and simple data structures; in particular arrays. When I have large and simple data structures I can try to do clever things with memory access to really optimise my use of the cache; in particular memory-tiling. Not sure if this is any use to you, but in general, given your set of requirements and existing understanding of your code, I'd be looking for ways to optimise the getting of data from RAM to CPU.

And, I would, of course, be parallelising the hell out of it all. Not possible unless you have a multi-computer. Oh, memo just in, we've all got those these days !!

Good luck and do let us know how you get on. I read a lot of crap on SO about what should be the best way to optimise this bit of code or that bit, very little hard evidence that anyone ever measures anything as you seem to have done.

Heck, I like your question so much I'm upvoting it.

Regards

Mark

Ideas for good performance optimisations in C++

Tags:

c++

optimization

SmacL

4 Answers

dirkgently

philsquared

Charles Eli Cheese

High Performance Mark

Recent Activity

Donate For Us

Ideas for good performance optimisations in C++

Tags:

c++

optimization

SmacL

4 Answers

dirkgently

philsquared

Charles Eli Cheese

High Performance Mark

Related questions

Recent Activity

Donate For Us