How safe are memory-mapped files for reading input files?

Tags:

Mapping an input file into memory and then directly parsing data from the mapped memory pages can be a convenient and efficient way to read data from files.

However, this practice also seems fundamentally unsafe unless you can ensure that no other process writes to a mapped file, because even the data in private read-only mappings may change if the underlying file is written to by another process. (POSIX e.g. doesn't specify "whether modifications to the underlying object done after the MAP_PRIVATE mapping is established are visible through the MAP_PRIVATE mapping".)

If you wanted to make your code safe in the presence of external changes to the mapped file, you'd have to access the mapped memory only through volatile pointers and then be extremely careful about how you read and validate the input, which seems impractical for many use cases.

Is this analysis correct? The documentation for memory mapping APIs generally mentions this issue only in passing, if at all, so I wonder whether I'm missing something.

470

asked Jan 22 '14 15:01

Stephan Tolksdorf

2 Answers

It is not really a problem.

Yes, another process may modify the file while you have it mapped, and yes, it is possible that you will see the modifications. It is even likely, since almost all operating systems have unified virtual memory systems, so unless one requests unbuffered writes, there's no way of writing without going through the buffer cache, and no way without someone holding a mapping seeing the change.
That isn't even a bad thing. Actually, it would be more disturbing if you couldn't see the changes. Since the file quasi becomes part of your address space when you map it, it makes perfect sense that you see changes to the file.

If you use conventional I/O (such as read), someone can still modify the file while you are reading it. Worded differently, copying file content to a memory buffer is not always safe in presence of modifications. It is "safe" insofar as read will not crash, but it does not guarantee that your data is consistent.
Unless you use readv, you have no guarantees about atomicity whatsoever (and even with readv you have no guarantee that what you have in memory is consistent with what is on disk or that it doesn't change between two calls to readv). Someone might modify the file between two read operations, or even while you are in the middle of it.
This isn't just something that isn't formally guaranteed but "probably still works" -- on the contrary, e.g. under Linux writes are demonstrably not atomic. Not even by accident.

The good news:
Usually, processes don't just open an arbitrary random file and start writing to it. When such a thing happens, it is usually either a well-known file that belongs to the process (e.g. log file), or a file that you explicitly told the process to write to (e.g. saving in a text editor), or the process creates a new file (e.g. compiler creating an object file), or the process merely appends to an existing file (e.g. db journals, and of course, log files). Or, a process might atomically replace a file with another one (or unlink it).

In every case, the whole scary problem boils down to "no issue" because either you are well aware of what will happen (so it's your responsibility), or it works seamlessly without interfering.

If you really don't like the possibility that another process could possibly write to your file while you have it mapped, you can simply omit FILE_SHARE_WRITE under Windows when you create the file handle. POSIX makes it somewhat more complicated since you need to fcntl the descriptor for a mandatory lock, which isn't necessary supported or 100% reliable on every system (for example, under Linux).

answered Sep 29 '22 01:09

Damon

In theory, you're probably in real trouble if someone does modify the file while you're reading it. In practice: you're reading characters, and nothing else: no pointers, or anything which could get you into trouble. In practice... formally, I think it's still undefined behavior, but it's one which I don't think you have to worry about. Unless the modifications are very minor, you'll get a lot of compiler errors, but that's about the end of it.

The one case which might cause problems is if the file was shortened. I'm not sure what happens then, when you're reading beyond the end.

And finally: the system isn't arbitrarily going to open and modify the file. It's a source file; it will be some idiot programmer who does it, and he deserves what he gets. In no case will your undefined behavior corrupt the system or other peoples files.

Note too that most editors work on a private copy; when the write back, they do so by renaming the original, and creating a new file. Under Unix, once you've opened the file to mmap it, all that counts is the inode number. And when the editor renames or deletes the file, you still keep your copy. The modified file will get a new inode. The only thing you have to worry about is if someone opens the file for update, and then goes around modifying it. Not many programs do this on text files, except for appending additional data to the end.

So while formally, there's some risk, I don't think you have to worry about it. (If you're really paranoid, you could turn off write authorisation while you're mmaped. And if there's really an enemy agent out to get your, he can turn it right back on.)

answered Sep 29 '22 01:09

James Kanze

Related questions
                            
                                Variable of a template class with a template class template parameter set to a base template of the derived template with the variable
                            
                                Convex Decomposition of a Complex Polygon?
                            
                                Handle arbitrary length integers in C++
                            
                                C++ library for integer trigonometry, speed optimized with optional approximations?
                            
                                C++11: Abstracting over const, volatile, lvalue reference, and rvalue reference qualified member function pointers?
                            
                                How to determine the type of a function parameter given the type of argument passed to it?
                            
                                overloading operator<< for arrays
                            
                                How to show private inheritance relationship in a UML class diagram
                            
                                Why is Xcode 4.3.1 putting red strikethrough through this protected variable?
                            
                                How can I detect whether a type is a visible base of another type?
                            
                                Org-mode failed to highlight C++ source code when exporting html
                            
                                Why Pointer Type Cast Does not Work on Template Non-type Parameters
                            
                                How to define custom float-type numpy dtypes (C-API)
                            
                                How install 64-bit Qt on Windows with C++11 support?
                            
                                How do I find out where the compiler spends its time?
                            
                                STL-compatible iterators for custom containers [closed]
                            
                                How can C++ and C variadic arguments be used together?
                            
                                Java deserialization in C++
                            
                                Can I reliably turn a string literal into a symbol name using templates (or fancy macros)?
                            
                                XAML apps using C++/CX for Desktop Windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How safe are memory-mapped files for reading input files?

Tags:

c++

c

posix

windows

memory-mapped-files

Stephan Tolksdorf

People also ask

2 Answers

Damon

James Kanze

Recent Activity

Donate For Us