How can I cause an instruction cache miss?

2 Answers

As people have explained, an instruction cache miss is conceptually the same as a data-cache miss - the instructions are not in the cache. This is because the processor's program counter (PC) has jumped to a place which hasn't been loaded into the cache, or has been flushed out because the cache got filled, and that cache line was the one chosen for eviction (usually least recently used).

It is a bit harder to generate enough code by hand to force an instruction miss than it is to force a data cache miss.

One way to get lots of code, for little effort, is to write a program which generates source code.

For example write a program to generate a function with a huge switch statement (in C) [Warning, untested]:

printf("void bigswitch(int n) {\n    switch (n) {");
for (int i=1; i<100000; ++i) {
    printf("        case %d: n += %d;\n", n, n+i/2);
}
printf("    }\n    return n;}\n");

Then you can call this from another function, and you can control how big a jump along the cache line it takes.

A property of a switch statement is the code can be forced to execute backwards, or in patterns by choosing the parameter. So you can work with the pre-fetching and prediction mechanisms, or try to work against them.

The same technique could be applied to generate lots of functions too, to ensure the cache can be 'busted' at will. So you may have bigswitch001, bigswitch002, etc. You might call this using a switch which you also generate.

If you can make each function (approximately) some number of i-cache lines in size, and also generate more functions than will fit in cache, then the problem of generating instruction cache-misses becomes easier to control.

You can see exactly how big a function, an entire switch statement, or each leg of a switch statement is by dumping the assembler (using gcc -S), or objdump the .o file. So you could 'tune' the size of a function by adjusting the number of case: statements. You could also choose how many cache lines are hit, by judicious choice of the parameter to bigswitchNNN().

130

answered Nov 09 '22 23:11

gbulmer

In addition to all the other ways mentioned here, another very reliable way to force an instruction cache miss is to have self-modifying code.

If you write to a page of code in memory (assuming you configured the OS to permit this), then of course the corresponding line of instruction cache immediately becomes invalid, and the processor is forced to refetch it.

It is not branch prediction that causes an icache miss, by the way, but simply branching. You miss instruction cache whenever the processor tries to run an instruction that has not recently been run. Modern x86 is smart enough to prefetch instructions in sequence, so you are very unlikely to miss icache by just ordinary walking forward from one instruction to the next. But any branch (conditional or otherwise) jumps to a new address out of sequence. If the new instruction address hasn't been run recently, and isn't near the code you were already running, it is likely to be out of cache, and the processor must stop and wait for the instructions to come in from main RAM. This is exactly like data cache.

Some very modern processors (recent i7) are able to look at upcoming branches in code and start the icache prefetching the possible targets, but many cannot (video game consoles). Fetching data from main RAM to icache is totally different from the "instruction fetching" stage of the pipeline, which is what branch prediction is about.

"Instruction fetch" is part of the CPU's execution pipeline, and refers to bringing an opcode from icache into the CPU's execution unit, where it can start decoding and doing work. That is different from "instruction cache" fetching, which must happen many cycles earlier and involves the cache circuitry making a request to the main memory unit to send some bytes across the bus. The first is an interaction between two stages of the CPU pipeline. The second is an interaction between the pipeline and the memory cache and main RAM, which is a much more complicated piece of circuitry. The names are confusingly similar, but they're totally separate operations.

So one other way to cause instruction cache misses would be to write (or generate) lots of really big functions, so that your code segment is huge. Then call wildly from one function to another, so that from the CPU's point of view you are doing crazy GOTOs all over memory.

answered Nov 09 '22 23:11

Crashworks

Related questions
                            
                                Are C-structs with the same members types guaranteed to have the same layout in memory?
                            
                                Is floating point multiplication by zero guaranteed to produce zero?
                            
                                How much memory does int x[10] allocate?
                            
                                Best API for low-level audio in Windows?
                            
                                Can I make gcc tell me when a calculation results in NaN or inf at runtime?
                            
                                getaddrinfo and IPv6
                            
                                Is there a use for function declarations inside functions?
                            
                                To find largest element smaller than K in a BST
                            
                                How to convert negative zero to positive zero in C?
                            
                                Do C & C++ compilers optimize comparisons with function calls?
                            
                                Faster way to move memory page than mremap()?
                            
                                Does Linux kernel have main function?
                            
                                declare extern variable within a C function?
                            
                                execute binary machine code from C
                            
                                What is the purpose of format specifier "%qd" in `printf()`?
                            
                                Get notified about network interface change on Linux
                            
                                Code for malloc and free
                            
                                How can floating point calculations be made deterministic?
                            
                                Why I can't read openssl generated RSA pub key with PEM_read_RSAPublicKey?
                            
                                How do you pre-allocate space for a file in C/C++ on Windows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I cause an instruction cache miss?

Tags:

performance

c

linux

microprocessors

William the Pleaser

People also ask

2 Answers

gbulmer

Crashworks

Recent Activity

Donate For Us