CPU cache behaviour/policy for file-backed memory mappings?

Tags:

Does anyone know which type of CPU cache behaviour or policy (e.g. uncacheable write-combining) is assigned to memory mapped file-backed regions on modern x86 systems?

Is there any way to detect which is the case, and possibly override the default behaviour?

Windows and Linux are the main operating systems of interest.

(Editor's note: the question was previously phrased as memory mapped I/O, but that phrase has a different specific technical meaning, especially when talking about CPU caches. i.e. actual I/O devices like NICs or video cards that you talk to with loads / stores.

This question is actually about what kind of memory you get from mmap(some_fd, ...), when you don't use MAP_ANONYMOUS and it's backed by a regular file on disk.)

435

asked Apr 06 '13 16:04

awdz9nld

1 Answers

TL:DR Memory mapped files use the normal Write-Back policy for pages of the pagecache that they map into the address space of your process. You have to do something special and OS-specific if you ever want pages that aren't WB.

Caching policy applied to the address space region is generally operating system independent and depends only on the type of device behind the address space page. In fact, the operating system is free to apply any caching policy to any memory region, but incorrectly assigned caching policy can reduce system performance or broke system logic at all.

There are at least four caching policies:

Full caching (write-back, aka WB). Applied to the physical address space mapped to the main memory (RAM). Used to increase the performance of memory subsystem performance. The main property of such devices is that its state can be changed only by software and can affect only software.

The memory mapped files implementation use full caching because they implemented completely by software (operating system) that read file chunk from disk and place it memory and then put this chunk (possibly modified) back to disk. Hardware updates a "dirty" bit in the page tables to let the OS figure out what needs to be synced to disk.
Write-through caching. (WT) The main property of such devices is that its state can be changed only by software, but the change must have an immediate effect on the device. According to this policy, data written to the memory-mapped IO device register will be placed in two places concurrently: in the cache and in the device. But when the data read will be initiated, data will be captured from the cache without expensive access to the device.

This cache policy could be useful for a MMIO device that doesn't write its memory, only reads what the CPU wrote. In practice it's rarely used for anything. GPUs aren't like that, and do write video memory, so it's not used for video RAM. (There's no mechanism for the GPU to invalidate CPU caches of the region, because the GPU isn't part of the CPU's cache-coherency domain)
Uncacheable, write-combining (WC aka USCW): Weakly ordered memory typically used for mapping video RAM. Like uncacheable, except that NT stores let you efficiently write a whole cache line at once. movntdqa loads let you efficiently read whole cache lines, which you can't do any other way from WC regions. Normal loads fetch data separately for each load, even within the same line, because it's uncacheable.
Disabled caching. (UC) Applied to the almost all IO device, because the writing to the memory-mapped IO device register must have immediate effect and read from the memory-mapped IO device register must return to the reader actual data from the device. If caching will be applied to memory-mapped IO device, then two negative effects will be introduced:
1. The writing to the memory-mapped IO device register will be delayed until the moment when cache controller will decide to flush cache line with written data. As result, the driver won't be able to know when the command written to the device will take effect.
2. The reading data from the memory-mapped IO device register can be cached. And subsequent data read from the same memory-mapped IO device register can return not actual data from the device, but outdated data from the cache. Due to this, it will be hard for the driver to capture the actual state of the device.

Due to the fact that the way by which software can specify caching policy is only processor dependent the same algorithm can be applied in any operating system. The simplest way is to capture the content of the CR3 register, and using it locate the Page Table Entry appropriate to the address which caching policy you want to know and check the PCD and PWT flags. But this way isn't complete because there are few other features that can affect caching (for example, caching can be completely disabled on CR0, see also MTRR, PAT).

answered Oct 30 '22 14:10

ZarathustrA

Related questions
                            
                                Why is std::type_info noncopyable? Am I allowed to store it somewhere?
                            
                                is it possible to place std::vector to shared memory?
                            
                                C++ Operator () parenthesis overloading
                            
                                Audio/MIDI C++ library for a real-time application
                            
                                How to build Google RE2 for Windows?
                            
                                boost::program_options config file option with multiple tokens
                            
                                Understanding virtual base classes and constructor calls
                            
                                LTO, Devirtualization, and Virtual Tables
                            
                                Using bind1st for a method that takes argument by reference
                            
                                Stumped with Unicode, Boost, C++, codecvts
                            
                                How can I ease the syntactic overhead of checking iterator values in C++?
                            
                                boost::asio io_service thread pool
                            
                                How to know if a ".exe" process was written with C++ or C#? [duplicate]
                            
                                Why does using this C++ function twice in one line cause a compile error?
                            
                                C++11 std::function and perfect forwarding
                            
                                C++ (and maths) : fast approximation of a trigonometric function
                            
                                returning a custom object from a wrapped method in Rcpp
                            
                                Appending std::vector to itself, undefined behavior?
                            
                                Is there a sequence point between a function call returning an object and a method call on that object?
                            
                                What are the trivial operations in std::is_trivially_copy_constructible in C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CPU cache behaviour/policy for file-backed memory mappings?

Tags:

c++

cpu-architecture

operating-system

x86

cpu-cache

awdz9nld

People also ask

1 Answers

ZarathustrA

Recent Activity

Donate For Us