Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reproducing Unexpected Behavior w/Cross-Modifying Code on x86-64 CPUs

Question

What are some ideas for cross-modifying code that could trigger unexpected behavior on x86 or x86-x64 systems, where everything is done correctly in the cross-modifying code, with the exception of executing a serializing instruction on the executing processor prior to executing the modified code?

As noted below, I have a Core 2 Duo E6600 processor to test on, which is explicitly mentioned as a processor that is prone to issues regarding this. I will test any ideas shared with me on this machine and give updates.

Background

On x86 and x64 systems, the official guidance for writing cross-modifying code is to do the following:

; Action of Modifying Processor
Store modified code (as data) into code segment;
Memory_Flag ← 1; 

; Action of Executing Processor
WHILE (Memory_Flag ≠ 1)
  Wait for code to update;
ELIHW;
Execute serializing instruction; (* For example, CPUID instruction *)
Begin executing modified code;

The serializing instruction is explicitly mentioned as necessary in the errata for some processors. For example, Intel Core 2 Duo E6000 series have the following erratum: (from http://www.mathemainzel.info/files/intelX6800andintelE6000.pdf)

The act of one processor, or system bus master, writing data into a currently executing code segment of a second processor with the intent of having the second processor execute that data as code is called cross-modifying code (XMC). XMC that does not force the second processor to execute a synchronizing instruction, prior to execution of the new code, is called unsynchronized XMC.

Software using unsynchronized XMC to modify the instruction byte stream of a processor can see unexpected or unpredictable execution behavior from the processor that is executing the modified code.

There is some speculation as to why unexpected execution behavior could occur if a serializing instruction is not used at http://linux.kernel.narkive.com/FDc9TB0d/patch-linux-kernel-markers:

When the i-fetch has been done and the micro-ops are in the trace cache then there's no longer a direct correlation between the original machine instruction boundaries and the micro ops. This is due to optimization. For example (artificial one for illustrative purposes):

mov eax,ebx

mov memory,eax

mov eax,1

(using intel notation not ATT - force of habit)

In the trace cache there would be no micro ops to update eax with ebx.

Altering the "mov eax,ebx" to "mov ecx,ebx" on the fly invalidates the optimized trace cache, hence the onlhy recourse is a GPF. If the modification doens't invalidate the trace cache then no GPF. The question is: "can we predict th circumstances when the trace cache has not been invalidated", and the answer in general is no since the microarchtecture is not public. But one can guess that modifying the single byte opcode with in interrupting instruction - int3 - doesn't cause an inconsistency that can't be handled. And that's what Intel confirmed. Go ahead and store int3 without the need to synchronise (i.e. force the trace cache to be flushed).

There's also a bit more information at https://sourceware.org/ml/systemtap/2005-q3/msg00208.html:

When we became aware of this I had a long discussion with Intel's microarchitecture guys. It turns out that the reason for this erratum (which incidentally Intel does not intend to fix) is because the trace cache - the stream of micorops resulting from instruction interpretation - cannot guaranteed to be valid. Reading between the lines I assume this issue arises because of optimization done in the trace cache, where it is no longer possible to identify the original instruction boundaries. If the CPU discoverers that the trace cache has been invalidated because of unsynchronized cross-modification then instruction execution will be aborted with a GPF. Further discussion with Intel revealed that replacing the first opcode byte with an int3 would not be subject to this erratum.

Beyond what I've posted here, there's not too much I've seen on the internet regarding this issue. Additionally, I haven't found any public examples of people getting bitten by failing to execute the serializing instruction when using cross-modifying code on x86 and x86-64 systems.

I have a computer running an Intel Core 2 Duo E6600 Processor, which is explicitly documented as being prone to this problem, and I have not been able to write code that triggers this issue.

Writing code to do this is a personal curiosity for me. In production code, I'd just follow the rules, but I figure there's probably something for me to learn in reproducing this.

like image 829
Jason Avatar asked Jan 26 '15 04:01

Jason


1 Answers

Think of a processor that has a very long instruction pipeline where registers and memory are only modified in the last pipeline stage. When you write self modifying code for this processor and modify an instruction in memory that is already present in the pipeline, the modification will have no effect. In this case the behaviour of the program depends on how long the pipeline of the processor is.

To make new processors with longer pipelines behave exactly as older models, Intel processors include a mechanism that flushes (empties) the pipeline if this case is detected. After the flush, the modified code is fetched into the pipeline, so the new processor behaves exactly as old ones.

A serializing instruction is another way to flush the pipeline. When it reaches the end of the pipeline, the pipeline is flushed and starts fetching again after the serializing instruction.

So what the errata is essentially saying is that some processor models do not check if writes from other processors overwrite instructions that are already executing in their pipeline. The check works only for local writes, not for external writes. But if you insert a serializing instruction you force the processor to flush the pipeline and everything will behave as expected.

To reproduce the behaviour described in the errata you need to make sure that the code you are modifying from one processor is inside the pipeline of the other processor. Take a look at branch prediction (decides which code path is inside the pipeline) and synchronization primitives.

like image 125
Mackie Messer Avatar answered Oct 10 '22 06:10

Mackie Messer