Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How safe are memory-mapped files for reading input files?

Mapping an input file into memory and then directly parsing data from the mapped memory pages can be a convenient and efficient way to read data from files.

However, this practice also seems fundamentally unsafe unless you can ensure that no other process writes to a mapped file, because even the data in private read-only mappings may change if the underlying file is written to by another process. (POSIX e.g. doesn't specify "whether modifications to the underlying object done after the MAP_PRIVATE mapping is established are visible through the MAP_PRIVATE mapping".)

If you wanted to make your code safe in the presence of external changes to the mapped file, you'd have to access the mapped memory only through volatile pointers and then be extremely careful about how you read and validate the input, which seems impractical for many use cases.

Is this analysis correct? The documentation for memory mapping APIs generally mentions this issue only in passing, if at all, so I wonder whether I'm missing something.

like image 470
Stephan Tolksdorf Avatar asked Jan 22 '14 15:01

Stephan Tolksdorf


People also ask

Are memory-mapped files thread safe?

Yes. If one thread changes part of the data in the mapping, then all other threads immediately see that change.

What is the drawback of memory-mapped input output?

But there are also disadvantages: An I/O error on a memory-mapped file cannot be caught and dealt with by SQLite. Instead, the I/O error causes a signal which, if not caught by the application, results in a program crash.

What are the advantages of memory-mapped files?

The principal benefits of memory-mapping are efficiency, faster file access, the ability to share memory between applications, and more efficient coding.

Where are memory-mapped files stored?

Memory-mapped files are accessed through the operating system's memory manager, so the file is automatically partitioned into a number of pages and accessed as needed. You do not have to handle the memory management yourself.


2 Answers

It is not really a problem.

Yes, another process may modify the file while you have it mapped, and yes, it is possible that you will see the modifications. It is even likely, since almost all operating systems have unified virtual memory systems, so unless one requests unbuffered writes, there's no way of writing without going through the buffer cache, and no way without someone holding a mapping seeing the change.
That isn't even a bad thing. Actually, it would be more disturbing if you couldn't see the changes. Since the file quasi becomes part of your address space when you map it, it makes perfect sense that you see changes to the file.

If you use conventional I/O (such as read), someone can still modify the file while you are reading it. Worded differently, copying file content to a memory buffer is not always safe in presence of modifications. It is "safe" insofar as read will not crash, but it does not guarantee that your data is consistent.
Unless you use readv, you have no guarantees about atomicity whatsoever (and even with readv you have no guarantee that what you have in memory is consistent with what is on disk or that it doesn't change between two calls to readv). Someone might modify the file between two read operations, or even while you are in the middle of it.
This isn't just something that isn't formally guaranteed but "probably still works" -- on the contrary, e.g. under Linux writes are demonstrably not atomic. Not even by accident.

The good news:
Usually, processes don't just open an arbitrary random file and start writing to it. When such a thing happens, it is usually either a well-known file that belongs to the process (e.g. log file), or a file that you explicitly told the process to write to (e.g. saving in a text editor), or the process creates a new file (e.g. compiler creating an object file), or the process merely appends to an existing file (e.g. db journals, and of course, log files). Or, a process might atomically replace a file with another one (or unlink it).

In every case, the whole scary problem boils down to "no issue" because either you are well aware of what will happen (so it's your responsibility), or it works seamlessly without interfering.

If you really don't like the possibility that another process could possibly write to your file while you have it mapped, you can simply omit FILE_SHARE_WRITE under Windows when you create the file handle. POSIX makes it somewhat more complicated since you need to fcntl the descriptor for a mandatory lock, which isn't necessary supported or 100% reliable on every system (for example, under Linux).

like image 96
Damon Avatar answered Sep 29 '22 01:09

Damon


In theory, you're probably in real trouble if someone does modify the file while you're reading it. In practice: you're reading characters, and nothing else: no pointers, or anything which could get you into trouble. In practice... formally, I think it's still undefined behavior, but it's one which I don't think you have to worry about. Unless the modifications are very minor, you'll get a lot of compiler errors, but that's about the end of it.

The one case which might cause problems is if the file was shortened. I'm not sure what happens then, when you're reading beyond the end.

And finally: the system isn't arbitrarily going to open and modify the file. It's a source file; it will be some idiot programmer who does it, and he deserves what he gets. In no case will your undefined behavior corrupt the system or other peoples files.

Note too that most editors work on a private copy; when the write back, they do so by renaming the original, and creating a new file. Under Unix, once you've opened the file to mmap it, all that counts is the inode number. And when the editor renames or deletes the file, you still keep your copy. The modified file will get a new inode. The only thing you have to worry about is if someone opens the file for update, and then goes around modifying it. Not many programs do this on text files, except for appending additional data to the end.

So while formally, there's some risk, I don't think you have to worry about it. (If you're really paranoid, you could turn off write authorisation while you're mmaped. And if there's really an enemy agent out to get your, he can turn it right back on.)

like image 39
James Kanze Avatar answered Sep 29 '22 01:09

James Kanze