Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is x86 instruction cache synchronized?

Tags:

I like examples, so I wrote a bit of self-modifying code in c...

#include <stdio.h> #include <sys/mman.h> // linux  int main(void) {     unsigned char *c = mmap(NULL, 7, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|                             MAP_ANONYMOUS, -1, 0); // get executable memory     c[0] = 0b11000111; // mov (x86_64), immediate mode, full-sized (32 bits)     c[1] = 0b11000000; // to register rax (000) which holds the return value                        // according to linux x86_64 calling convention      c[6] = 0b11000011; // return     for (c[2] = 0; c[2] < 30; c[2]++) { // incr immediate data after every run         // rest of immediate data (c[3:6]) are already set to 0 by MAP_ANONYMOUS         printf("%d ", ((int (*)(void)) c)()); // cast c to func ptr, call ptr     }     putchar('\n');     return 0; } 

...which works, apparently:

>>> gcc -Wall -Wextra -std=c11 -D_GNU_SOURCE -o test test.c; ./test 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 

But honestly, I didn't expect it to work at all. I expected the instruction containing c[2] = 0 to be cached upon the first call to c, after which all consecutive calls to c would ignore the repeated changes made to c (unless I somehow explicitedly invalidated the cache). Luckily, my cpu appears to be smarter than that.

I guess the cpu compares RAM (assuming c even resides in RAM) with the instruction cache whenever the instruction pointer makes a large-ish jump (as with the call to the mmapped memory above), and invalidates the cache when it doesn't match (all of it?), but I'm hoping to get more precise information on that. In particular, I'd like to know if this behavior can be considered predictable (barring any differences of hardware and os), and relied on?

(I probably should refer to the Intel manual, but that thing is thousands of pages long and I tend to get lost in it...)

like image 908
Will Avatar asked Jun 12 '12 01:06

Will


People also ask

How does an instruction cache work?

The instruction and data caches have a subtle difference: instructions are only fetched (read) from memory, but data can be read from or written to memory. For the instruction cache, blocks are copied from main memory to the cache.

Is x86 cache coherent?

x86 does handle coherence.

Are instructions stored in cache?

Generally speaking, all traffic between the CPU and RAM goes through a cache; so do the instructions contained by the executable. That is the short answer.

Why are instruction and data caches separate?

The split design enables us to place the instruction cache close to the instruction fetch unit and the data cache close to the memory unit, thereby simultaneously reducing the latencies of both.


1 Answers

What you do is usually referred as self-modifying code. Intel's platforms (and probably AMD's too) do the job for you of maintaining an i/d cache-coherency, as the manual points it out (Manual 3A, System Programming)

11.6 SELF-MODIFYING CODE

A write to a memory location in a code segment that is currently cached in the processor causes the associated cache line (or lines) to be invalidated.

But this assertion is valid as long as the same linear address is used for modifying and fetching, which is not the case for debuggers and binary loaders since they don't run in the same address-space:

Applications that include self-modifying code use the same linear address for modifying and fetching the instruction. Systems software, such as a debugger, that might possibly modify an instruction using a different linear address than that used to fetch the instruction, will execute a serializing operation, such as a CPUID instruction, before the modified instruction is executed, which will automatically resynchronize the instruction cache and prefetch queue.

For instance, serialization operation are always requested by many other architectures such as PowerPC, where it must be done explicitely (E500 Core Manual):

3.3.1.2.1 Self-Modifying Code

When a processor modifies any memory location that can contain an instruction, software must ensure that the instruction cache is made consistent with data memory and that the modifications are made visible to the instruction fetching mechanism. This must be done even if the cache is disabled or if the page is marked caching-inhibited.

It is interesting to notice that PowerPC requires the issue of a context-synchronizing instruction even when caches are disabled; I suspect it enforces a flush of deeper data processing units such as the load/store buffers.

The code you proposed is unreliable on architectures without snooping or advanced cache-coherency facilities, and therefore likely to fail.

Hope this help.

like image 133
Benoit Avatar answered Nov 03 '22 00:11

Benoit