Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I cause an instruction cache miss?

I've been tasked with generating a certain number of data-cache misses and instruction-cache misses. I've been able to handle the data-cache portion without issue.

So I'm left with generating the instruction-cache misses. I do not have any idea what causes these. Can someone suggest a method of generating them?

I'm using GCC in Linux.

like image 341
William the Pleaser Avatar asked May 14 '12 17:05

William the Pleaser


People also ask

What causes high miss rate cache memory?

The worst cache miss rate occurs when there is no tiling, but the worst CPI occurs with tile size 288 × 288. CPI improves slightly when tiling is discontinued. This is likely due to lower instruction CPI that results from the reduction of executed branch instructions from needing fewer iterations of the tile loops.

How do I stop instruction cache misses?

Increase the Size of Your Cache or Random Access Memory (RAM) Another option for reducing cache misses is to increase the size of your cache or RAM. Obviously, the larger your cache, the more data it can hold and, thus, the fewer cache misses you're likely to deal with. However, increasing your RAM can be a bit pricey.


2 Answers

As people have explained, an instruction cache miss is conceptually the same as a data-cache miss - the instructions are not in the cache. This is because the processor's program counter (PC) has jumped to a place which hasn't been loaded into the cache, or has been flushed out because the cache got filled, and that cache line was the one chosen for eviction (usually least recently used).

It is a bit harder to generate enough code by hand to force an instruction miss than it is to force a data cache miss.

One way to get lots of code, for little effort, is to write a program which generates source code.

For example write a program to generate a function with a huge switch statement (in C) [Warning, untested]:

printf("void bigswitch(int n) {\n    switch (n) {");
for (int i=1; i<100000; ++i) {
    printf("        case %d: n += %d;\n", n, n+i/2);
}
printf("    }\n    return n;}\n");

Then you can call this from another function, and you can control how big a jump along the cache line it takes.

A property of a switch statement is the code can be forced to execute backwards, or in patterns by choosing the parameter. So you can work with the pre-fetching and prediction mechanisms, or try to work against them.

The same technique could be applied to generate lots of functions too, to ensure the cache can be 'busted' at will. So you may have bigswitch001, bigswitch002, etc. You might call this using a switch which you also generate.

If you can make each function (approximately) some number of i-cache lines in size, and also generate more functions than will fit in cache, then the problem of generating instruction cache-misses becomes easier to control.

You can see exactly how big a function, an entire switch statement, or each leg of a switch statement is by dumping the assembler (using gcc -S), or objdump the .o file. So you could 'tune' the size of a function by adjusting the number of case: statements. You could also choose how many cache lines are hit, by judicious choice of the parameter to bigswitchNNN().

like image 130
gbulmer Avatar answered Nov 09 '22 23:11

gbulmer


In addition to all the other ways mentioned here, another very reliable way to force an instruction cache miss is to have self-modifying code.

If you write to a page of code in memory (assuming you configured the OS to permit this), then of course the corresponding line of instruction cache immediately becomes invalid, and the processor is forced to refetch it.

It is not branch prediction that causes an icache miss, by the way, but simply branching. You miss instruction cache whenever the processor tries to run an instruction that has not recently been run. Modern x86 is smart enough to prefetch instructions in sequence, so you are very unlikely to miss icache by just ordinary walking forward from one instruction to the next. But any branch (conditional or otherwise) jumps to a new address out of sequence. If the new instruction address hasn't been run recently, and isn't near the code you were already running, it is likely to be out of cache, and the processor must stop and wait for the instructions to come in from main RAM. This is exactly like data cache.

Some very modern processors (recent i7) are able to look at upcoming branches in code and start the icache prefetching the possible targets, but many cannot (video game consoles). Fetching data from main RAM to icache is totally different from the "instruction fetching" stage of the pipeline, which is what branch prediction is about.

"Instruction fetch" is part of the CPU's execution pipeline, and refers to bringing an opcode from icache into the CPU's execution unit, where it can start decoding and doing work. That is different from "instruction cache" fetching, which must happen many cycles earlier and involves the cache circuitry making a request to the main memory unit to send some bytes across the bus. The first is an interaction between two stages of the CPU pipeline. The second is an interaction between the pipeline and the memory cache and main RAM, which is a much more complicated piece of circuitry. The names are confusingly similar, but they're totally separate operations.

So one other way to cause instruction cache misses would be to write (or generate) lots of really big functions, so that your code segment is huge. Then call wildly from one function to another, so that from the CPU's point of view you are doing crazy GOTOs all over memory.

like image 44
Crashworks Avatar answered Nov 09 '22 23:11

Crashworks