How to write instruction cache friendly program in c++?

Tags:

Recently Herb Sutter gave a great talk on "Modern C++: What You Need to Know". The main theme of this talk was efficiency and how data locality and accessing the memory matters. He has also explained how linear access of memory(array/vector) would be loved by CPU. He has taken one example from another classical reference "Game performance by Bob Nystrom" on this topic.

After reading these articles, I got that there is two type of cache which impact the program performance:

Data Cache
Instruction Cache

Cachegrind tool also measures both cache type instrumentation information of our program. The first points has been explained by many article/blog and how to achieve the good data cache efficiency(data locality).

However I did not get much information on topic Instruction Cache and what sort of thing we should take care in our program to achieve the better performance?. As per my understanding, we(programmer) do not have much control on which instruction or what order would be executing.

It would be really nice if small c++ programs explains how this counter(.i.e instruction cache) would vary with our style of writing program. What are the best practice programmer should follow to achieve better performance with respect to this point?

I mean we can understand about data cache topics if our program does(vector vs list) in similar way does it possible to explain about 2nd point. The main intention of this question is to understand this topic as much as possible.

267

asked Apr 07 '14 19:04

Mantosh Kumar

1 Answers

Any code that changes the flow of execution affects the Instruction Cache. This includes function calls and loops as well as dereferencing function pointers.

When a branch or jump instruction is executed, the processor has to spend extra time deciding if the code is already in the instruction cache or whether it needs to reload the instruction cache (from the destination of the branch).

For example, some processors may have a large enough instruction cache to hold the execution code for small loops. Some processors don't have a large instruction cache and simple reload it. Reloading of the instruction cache takes time that could be spent executing instructions.

Search these topics:

Loop unrolling
Conditional instruction execution (available on ARM processors)
Inline functions
Instruction pipeline

Edit 1: Programming techniques for better performance
To improve performance and reduce the instruction cache reloading do the following:

Reduce "if" statements Design your code to minimize "if" statements. This may include Boolean Algebra, using more math or simplifying comparisons (are they really needed?). Prefer to reduce the content of "then" and "else" clauses so that the compiler can use conditional assembly language instructions.

Define small functions as inline or macros
There is an overhead associated with calling functions, such as storing the return location and reloading the instruction cache. For functions with a small amount of statements, try suggesting to the compiler that they be made inline. Inlining means to paste the contents of the code where the execution is, rather than making a function call. Since the function call is avoided, so is the need to reload the instruction cache.

Unroll loops
For small iterations, don't loop, but repeat the content of the loop (some compilers may do this at higher optimization level settings). The more content repeated, the less number of branches to the top of the loop and less need to reload the instruction cache.

Use table lookups, not "if" statements
Some programs use "if-else-if" ladders for mapping data to values. Each "if" statement is a break in the execution in the instruction cache. Sometimes, with a little math, the values can be placed in a table like an array and the index calculated mathematically. Once the index is known, the processor can retrieve the data without disrupting the instruction cache.

Change data or data structures
If the type of data is constant, a program can be optimized around the data. For example, a program handling message packets could base its operations based on the packet IDs (think array of function pointers). Functions would be optimized for packet processing.

Change linked lists to arrays or other random-access container. Elements of an array can be accessed using math and not interrupt execution. Linked lists must be traversed (loop) to find an item.

113

answered Oct 23 '22 11:10

Thomas Matthews

Related questions
                            
                                Class-scoped enum
                            
                                How do I prevent SIGPIPE when using boost::asio?
                            
                                Is this constructor initializer causing a dangling reference?
                            
                                race-condition in pthread_once()?
                            
                                Faster range for loop (C++11)
                            
                                Building a Mac and Windows GUI Application
                            
                                Boost Context library
                            
                                Why "constructor-way" of declaring variable in "for-loop" allowed but in "if-statement" not allowed? [duplicate]
                            
                                Can X x(t...) ever result in a function declaration with vexing parse?
                            
                                Delegating constructors: an initializer for a delegating constructor must appear alone
                            
                                Free Application to check Memory Leaks in Windows x64?
                            
                                overload of std::unordered_map::insert
                            
                                How can you get the Linux thread Id of a std::thread()
                            
                                Convert a std::vector to a NumPy array without copying data
                            
                                SWIG: difference between %import and %include
                            
                                How to benchmark Boost Spirit Parser?
                            
                                Trying to understand "pointer to member"
                            
                                C++ Boost Filesystem: How to modify stem from a path?
                            
                                CRTP and c++1y return type deduction
                            
                                What happens when an exception is thrown while computing a constexpr?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to write instruction cache friendly program in c++?

Tags:

c++

caching

c++11

c++14

cachegrind

Mantosh Kumar

People also ask

1 Answers

Thomas Matthews

Recent Activity

Donate For Us