Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CPU cache behaviour/policy for file-backed memory mappings?

Does anyone know which type of CPU cache behaviour or policy (e.g. uncacheable write-combining) is assigned to memory mapped file-backed regions on modern x86 systems?

Is there any way to detect which is the case, and possibly override the default behaviour?

Windows and Linux are the main operating systems of interest.

(Editor's note: the question was previously phrased as memory mapped I/O, but that phrase has a different specific technical meaning, especially when talking about CPU caches. i.e. actual I/O devices like NICs or video cards that you talk to with loads / stores.

This question is actually about what kind of memory you get from mmap(some_fd, ...), when you don't use MAP_ANONYMOUS and it's backed by a regular file on disk.)

like image 435
awdz9nld Avatar asked Apr 06 '13 16:04

awdz9nld


People also ask

What are the two policies of cache memory?

There are two possible update policies “write through” and “write back.” The “write through” policy will force CPU writes to update the cache memory and the system memory simultaneously.

What is L1d cache?

L1 Cache. L1 (Level 1) cache is the fastest memory that is present in a computer system. In terms of priority of access, the L1 cache has the data the CPU is most likely to need while completing a certain task. The size of the L1 cache depends on the CPU.

What is meant by cache policy?

A cache policy defines rules that are used to determine whether a request can be satisfied using a cached copy of the requested resource.

What are the writing policies of cache explain with diagram?

This type of choice is known as write allocation: Write allocate or fetch-on-write is the decision to populate the cache before any subsequent backing store operation. No-write allocate or no-fetch-on-write is the decision to bypass the cache and move forward with the backing store operation.


1 Answers

TL:DR Memory mapped files use the normal Write-Back policy for pages of the pagecache that they map into the address space of your process. You have to do something special and OS-specific if you ever want pages that aren't WB.


Caching policy applied to the address space region is generally operating system independent and depends only on the type of device behind the address space page. In fact, the operating system is free to apply any caching policy to any memory region, but incorrectly assigned caching policy can reduce system performance or broke system logic at all.

There are at least four caching policies:

  1. Full caching (write-back, aka WB). Applied to the physical address space mapped to the main memory (RAM). Used to increase the performance of memory subsystem performance. The main property of such devices is that its state can be changed only by software and can affect only software.

    The memory mapped files implementation use full caching because they implemented completely by software (operating system) that read file chunk from disk and place it memory and then put this chunk (possibly modified) back to disk. Hardware updates a "dirty" bit in the page tables to let the OS figure out what needs to be synced to disk.

  2. Write-through caching. (WT) The main property of such devices is that its state can be changed only by software, but the change must have an immediate effect on the device. According to this policy, data written to the memory-mapped IO device register will be placed in two places concurrently: in the cache and in the device. But when the data read will be initiated, data will be captured from the cache without expensive access to the device.

    This cache policy could be useful for a MMIO device that doesn't write its memory, only reads what the CPU wrote. In practice it's rarely used for anything. GPUs aren't like that, and do write video memory, so it's not used for video RAM. (There's no mechanism for the GPU to invalidate CPU caches of the region, because the GPU isn't part of the CPU's cache-coherency domain)

  3. Uncacheable, write-combining (WC aka USCW): Weakly ordered memory typically used for mapping video RAM. Like uncacheable, except that NT stores let you efficiently write a whole cache line at once. movntdqa loads let you efficiently read whole cache lines, which you can't do any other way from WC regions. Normal loads fetch data separately for each load, even within the same line, because it's uncacheable.
  4. Disabled caching. (UC) Applied to the almost all IO device, because the writing to the memory-mapped IO device register must have immediate effect and read from the memory-mapped IO device register must return to the reader actual data from the device. If caching will be applied to memory-mapped IO device, then two negative effects will be introduced:
    1. The writing to the memory-mapped IO device register will be delayed until the moment when cache controller will decide to flush cache line with written data. As result, the driver won't be able to know when the command written to the device will take effect.
    2. The reading data from the memory-mapped IO device register can be cached. And subsequent data read from the same memory-mapped IO device register can return not actual data from the device, but outdated data from the cache. Due to this, it will be hard for the driver to capture the actual state of the device.

Due to the fact that the way by which software can specify caching policy is only processor dependent the same algorithm can be applied in any operating system. The simplest way is to capture the content of the CR3 register, and using it locate the Page Table Entry appropriate to the address which caching policy you want to know and check the PCD and PWT flags. But this way isn't complete because there are few other features that can affect caching (for example, caching can be completely disabled on CR0, see also MTRR, PAT).

like image 57
ZarathustrA Avatar answered Oct 30 '22 14:10

ZarathustrA