Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write instruction cache friendly program in c++?

Recently Herb Sutter gave a great talk on "Modern C++: What You Need to Know". The main theme of this talk was efficiency and how data locality and accessing the memory matters. He has also explained how linear access of memory(array/vector) would be loved by CPU. He has taken one example from another classical reference "Game performance by Bob Nystrom" on this topic.

After reading these articles, I got that there is two type of cache which impact the program performance:

  1. Data Cache
  2. Instruction Cache

Cachegrind tool also measures both cache type instrumentation information of our program. The first points has been explained by many article/blog and how to achieve the good data cache efficiency(data locality).

However I did not get much information on topic Instruction Cache and what sort of thing we should take care in our program to achieve the better performance?. As per my understanding, we(programmer) do not have much control on which instruction or what order would be executing.

It would be really nice if small c++ programs explains how this counter(.i.e instruction cache) would vary with our style of writing program. What are the best practice programmer should follow to achieve better performance with respect to this point?

I mean we can understand about data cache topics if our program does(vector vs list) in similar way does it possible to explain about 2nd point. The main intention of this question is to understand this topic as much as possible.

like image 267
Mantosh Kumar Avatar asked Apr 07 '14 19:04

Mantosh Kumar


People also ask

What is a cache friendly code?

Cache friendly code tries to keep accesses close together in memory so that you minimize cache misses. So an example would be imagine you wanted to copy a giant 2 dimensional table. It is organized with reach row in consecutive in memory, and one row follow the next right after.

Does cache have instruction set?

The instruction cache contains predecode bits to demarcate individual instructions to improve decode speed. One of the challenges with the IA-32 and Intel 64 ISA is that instructions are variable in length. In other words, the size of the instruction is not known until the instruction has been partially decoded.

How do you write cache memory?

If write occurs to a location that is not present in the Cache(Write Miss), we use two options, Write Allocation and Write Around. Write Allocation: In Write Allocation data is loaded from the memory into cache and then updated. Write allocation works with both Write back and Write through.

What is instruction and data cache?

The instruction and data caches have a subtle difference: instructions are only fetched (read) from memory, but data can be read from or written to memory. For the instruction cache, blocks are copied from main memory to the cache.


1 Answers

Any code that changes the flow of execution affects the Instruction Cache. This includes function calls and loops as well as dereferencing function pointers.

When a branch or jump instruction is executed, the processor has to spend extra time deciding if the code is already in the instruction cache or whether it needs to reload the instruction cache (from the destination of the branch).

For example, some processors may have a large enough instruction cache to hold the execution code for small loops. Some processors don't have a large instruction cache and simple reload it. Reloading of the instruction cache takes time that could be spent executing instructions.

Search these topics:

  • Loop unrolling
  • Conditional instruction execution (available on ARM processors)
  • Inline functions
  • Instruction pipeline

Edit 1: Programming techniques for better performance
To improve performance and reduce the instruction cache reloading do the following:

Reduce "if" statements Design your code to minimize "if" statements. This may include Boolean Algebra, using more math or simplifying comparisons (are they really needed?). Prefer to reduce the content of "then" and "else" clauses so that the compiler can use conditional assembly language instructions.

Define small functions as inline or macros
There is an overhead associated with calling functions, such as storing the return location and reloading the instruction cache. For functions with a small amount of statements, try suggesting to the compiler that they be made inline. Inlining means to paste the contents of the code where the execution is, rather than making a function call. Since the function call is avoided, so is the need to reload the instruction cache.

Unroll loops
For small iterations, don't loop, but repeat the content of the loop (some compilers may do this at higher optimization level settings). The more content repeated, the less number of branches to the top of the loop and less need to reload the instruction cache.

Use table lookups, not "if" statements
Some programs use "if-else-if" ladders for mapping data to values. Each "if" statement is a break in the execution in the instruction cache. Sometimes, with a little math, the values can be placed in a table like an array and the index calculated mathematically. Once the index is known, the processor can retrieve the data without disrupting the instruction cache.

Change data or data structures
If the type of data is constant, a program can be optimized around the data. For example, a program handling message packets could base its operations based on the packet IDs (think array of function pointers). Functions would be optimized for packet processing.

Change linked lists to arrays or other random-access container. Elements of an array can be accessed using math and not interrupt execution. Linked lists must be traversed (loop) to find an item.

like image 113
Thomas Matthews Avatar answered Oct 23 '22 11:10

Thomas Matthews