Optimization Techniques for C++

Tags:

In his talk a few days ago at Facebook - slides, video, Andrei Alexandrescu talks about common intuitions that might prove us wrong. For me one very interesting point came up on Slide 7 where he states that the assumption "Fewer instructions = faster code" is not true and more instructions will not necessarily mean slower code.

Here comes my problem: The audio quality of his talk (around 6:20min) is not that well and I don't understand the explanation very well, but from what I get is that he is comparing retired instructions with optimality of an algorithm on a performance level.

However, from my understanding this cannot be done because these are two independent structural levels. Instructions (especially actually retired instructions) are one very important measure and basically, gives you an idea about performance to achieve a goal. If we leave out the latency of an instruction, we can generalize that fewer retired instructions = faster code. Now, of course there are cases where an algorithm that performs complex calculations inside a loop will yield better performance even though it is performed inside the loop, because it will break the loop earlier (think graph traversal). But wouldn't it be more useful to compare to algorithms on a complexity level rather than saying this loop has more instructions and is better than the other? From my point of view, the better algorithm will have less retired instructions in the end.

Can someone please help me to understand where he was going with his example, and how can there be a case where (significantly) more retired instructions lead to better performance?

589

asked Dec 20 '12 13:12

grundprinzip

1 Answers

The quality is indeed bad, but I think he leads to the fact that CPUs are good for calculations, but suffer from bad performance for memory seek (RAM is much slower then CPU), and branches (because CPU works as a pipeline, and branches might cause the pipeline to break).

Here are some cases where more instructions are faster:

Branch prediction - even if we need to do more instructions, but it causes for a better branch prediction, the pipeline of the CPU will be full more time, and less ops will be "thrown out" of it, which ultimately leads to better performance. This thread for example, shows how doing the same thing, but first sorting - improves performnce.
CPU Cache - If your code is more cache optimized, and follows the principle of locality - it is more likely to be faster then a code who doesn't, even if the code that doesn't do half the amount of instructions. This thread gives an example for a small cache optimization - that the same number of instructions might result in much slower code if it is not cache optimized.
It also matters which instructions are done. Sometimes - some instructions might be slower to perform then others, for example - divide might be slower then integer addition.

Note: All of the above are machine dependent and how/if they actually change the performance might vary from one architecture to the other.

133

answered Sep 23 '22 10:09

amit

Related questions
                            
                                C++ Forward declare using directive
                            
                                Does C++11 require this lambda to be declared mutable?
                            
                                push_back or emplace_back with std::make_unique
                            
                                Synchronization between command buffers in Vulkan
                            
                                Why is std::allocator a template?
                            
                                Asynchronous multi-direction server-client communication over the same open socket?
                            
                                Why does the "static" keyword have so many meanings in C and C++? [duplicate]
                            
                                What's the right way to specialize a template when using "extern template"?
                            
                                what is the difference between overloading an operator inside or outside a class?
                            
                                Is this infinite recursion UB?
                            
                                Initializing std::vector with iterative function calls
                            
                                Why is ::operator new[] necessary when ::operator new is enough?
                            
                                Deleter type in unique_ptr vs. shared_ptr [duplicate]
                            
                                How to get Magic Color effect like Cam Scanner using OpenCV
                            
                                C++ BOOST undefined reference to `boost::filesystem::detail::copy_file
                            
                                What is C++20's string literal operator template?
                            
                                Unit testing real-time / concurrent software [duplicate]
                            
                                C++ determine if compiling with debug symbols without defining a preprocessor symbol
                            
                                Is NaN a valid key value for associative containers?
                            
                                Pass, return and convert to vectors list of lists over JNI

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimization Techniques for C++

Tags:

c++

algorithm

optimization

grundprinzip

People also ask

1 Answers

amit

Recent Activity

Donate For Us