Are <code>mmap</code> calls atomic in their effect? That is, does a mapping change made by <code>mmap</code> appear atomically to other threads accessing the affected region? As a litmus test, consider the case you do a <code>mmap</code> in a file of all zeros (from thread T1 which is at this point the only thread), then start a second thread T2 reading from the region. Then, again on T1 (the original thread) do a second <code>mmap</code> call for the same region, replacing the mapping with a new one against a file of all ones. Is it possible for the reader thread to read a one from some page (i.e., see the second <code>mmap</code> in effect) and then subsequently read a zero from some page (i.e., see the first mapping in effect)? You may assume that the reads on the reader thread are properly fenced, i.e., that the effect above does not occur solely due to CPU/coherency level memory access reordering.

<code>Mmap(2)</code> is atomic with respect to the mappings across all threads; in part, at least, because <code>unmap(2)</code> also is. To break it down, the scenario described looks something like: <pre class="prettyprint"><code>MapRegion(from, to, obj) { Lock(&CurProc->map) while MapIntersect(&CurProc->map, from, to, &range) { MapUnMap(&CurProc->map, range.from, range.to) MapObjectRemove(&CurProc->map, range.from, range.to) } MapInsert(&CurProcc->map, from, to, obj) UnLock(&CurProc->map) } </code></pre> Following this, <code>map_unmap</code> has to ensure that while it is removing the mappings, no thread can access them. Notice the <code>Lock(&thisproc->map)</code>. <pre class="prettyprint"><code>MapUnMap(map, from, to) { foreach page in map.mmu[from .. to] { update page structure to invalidate mapping } foreach cpu in map.HasUsed { cause cpu to invoke tlb cache invalidation for (map, from, to) } } </code></pre> The first phase is to re-write the processor specific page tables to invalidate the area(s). The second phase is to force every cpu that has ever loaded this map into its translation cache to invalidate that cache. This bit is highly architecture dependent. On an older x86, rewriting <code>cr3</code> is typically enough, so the <code>HasUsed</code> is really <code>CurrentlyUsing</code>; whereas a newer amd64 might be able to cache multiple address space identifiers, so would be <code>HasUsed</code>. On an ARM, local tlb invalidation is broadcast to the local cluster; so <code>HasUsed</code> would refer to cluster ids rather than cpu ones. For more detail, search for <code>tlb shootdown</code>, as this is colloquially known as. Once these two phases are complete, no <code>thread</code> can access this address range. Any attempt to do so will cause a fault, which will cause the <code>faulting thread</code> to Lock its mapping structure, which is already locked by the <code>mapping thread</code>, so it will wait until the mapping is complete. When the mapping is complete, all of the old mappings have been removed and replaced by new mappings, so there is no way to retrieve a previous mapping after this point. What if another <code>thread</code> references the address range during the update? It will either continue with stale data or fault. In this respect stale data isn't an inconsistency, it is as if it had been referenced just before the <code>mapping thread</code> had entered <code>mmap(2)</code>. The faulting case is the same as for <code>faulting thread</code> above. In summary, update to the mappings is implemented using a series of transactions which ensure a consistent view of the address space. The cost of these transactions is architecture specific. The code to implement this can be quite intricate as it needs to guard against implicit operations, such as speculative fetching, as well as explicit ones.

Is mmap atomic?

Tags:

linux

multithreading

concurrency

mmap

memory-mapping

Are mmap calls atomic in their effect?

That is, does a mapping change made by mmap appear atomically to other threads accessing the affected region?

As a litmus test, consider the case you do a mmap in a file of all zeros (from thread T1 which is at this point the only thread), then start a second thread T2 reading from the region. Then, again on T1 (the original thread) do a second mmap call for the same region, replacing the mapping with a new one against a file of all ones.

Is it possible for the reader thread to read a one from some page (i.e., see the second mmap in effect) and then subsequently read a zero from some page (i.e., see the first mapping in effect)?

You may assume that the reads on the reader thread are properly fenced, i.e., that the effect above does not occur solely due to CPU/coherency level memory access reordering.

394

asked Jan 21 '20 17:01

BeeOnRope

Video Answer

1 Answers

Mmap(2) is atomic with respect to the mappings across all threads; in part, at least, because unmap(2) also is. To break it down, the scenario described looks something like:

MapRegion(from, to, obj) {
     Lock(&CurProc->map)
     while MapIntersect(&CurProc->map, from, to, &range) {
            MapUnMap(&CurProc->map, range.from, range.to)
            MapObjectRemove(&CurProc->map, range.from, range.to)
     }
     MapInsert(&CurProcc->map, from, to, obj)
     UnLock(&CurProc->map)
}

Following this, map_unmap has to ensure that while it is removing the mappings, no thread can access them. Notice the Lock(&thisproc->map).

MapUnMap(map, from, to) {
    foreach page in map.mmu[from .. to] {
         update page structure to invalidate mapping
    }
    foreach cpu in map.HasUsed {
         cause cpu to invoke tlb cache invalidation for (map, from, to)
    }
}

The first phase is to re-write the processor specific page tables to invalidate the area(s).

The second phase is to force every cpu that has ever loaded this map into its translation cache to invalidate that cache. This bit is highly architecture dependent. On an older x86, rewriting cr3 is typically enough, so the HasUsed is really CurrentlyUsing; whereas a newer amd64 might be able to cache multiple address space identifiers, so would be HasUsed. On an ARM, local tlb invalidation is broadcast to the local cluster; so HasUsed would refer to cluster ids rather than cpu ones. For more detail, search for tlb shootdown, as this is colloquially known as.

Once these two phases are complete, no thread can access this address range. Any attempt to do so will cause a fault, which will cause the faulting thread to Lock its mapping structure, which is already locked by the mapping thread, so it will wait until the mapping is complete. When the mapping is complete, all of the old mappings have been removed and replaced by new mappings, so there is no way to retrieve a previous mapping after this point.

What if another thread references the address range during the update? It will either continue with stale data or fault. In this respect stale data isn't an inconsistency, it is as if it had been referenced just before the mapping thread had entered mmap(2). The faulting case is the same as for faulting thread above.

In summary, update to the mappings is implemented using a series of transactions which ensure a consistent view of the address space. The cost of these transactions is architecture specific. The code to implement this can be quite intricate as it needs to guard against implicit operations, such as speculative fetching, as well as explicit ones.

155

answered Oct 17 '22 11:10

mevets

Related questions
                            
                                How to get the size of the CPU cache in Linux
                            
                                Unable to load JNA native support library Elasticsearch 6.x
                            
                                Unable to connect to PostgreSQL server: could not connect to server: Permission denied
                            
                                No module named 'matplotlib.pyplot'; 'matplotlib' is not a package
                            
                                Run text files in terminal
                            
                                How to install flatc and flatbuffers on linux ubuntu
                            
                                How to find the memory consumption of a particular process in linux for every 5 seconds
                            
                                get available memory in gb using single bash shell command
                            
                                MariaDB gcomm backend connection failed 110
                            
                                How do I ensure only one instance of a Ruby script is running at a time?
                            
                                How to add date string to each line of a continuously written log file
                            
                                Python code to check if service is running or not.?
                            
                                How to use a defined struct from another source file?
                            
                                Run a java program in backend
                            
                                Objective-C on Linux Compile Error
                            
                                cp dir recursivly excluding 2 subdirs
                            
                                How to solve "requires python 2.x support" in linux vim,and it have python 2.6.6 in my system
                            
                                How to remove/delete executable files (aka files without extension) only
                            
                                openpyxl convert CSV to EXCEL
                            
                                How to build and deploy a Linux driver?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With