How to portably extend a file accessed using mmap()

Tags:

We're experimenting with changing SQLite, an embedded database system, to use mmap() instead of the usual read() and write() calls to access the database file on disk. Using a single large mapping for the entire file. Assume that the file is small enough that we have no trouble finding space for this in virtual memory.

So far so good. In many cases using mmap() seems to be a little faster than read() and write(). And in some cases much faster.

Resizing the mapping in order to commit a write-transaction that extends the database file seems to be a problem. In order to extend the database file, the code could do something like this:

  ftruncate();    // extend the database file on disk    munmap();       // unmap the current mapping (it's now too small)   mmap();         // create a new, larger, mapping

then copy the new data into the end of the new memory mapping. However, the munmap/mmap is undesirable as it means the next time each page of the database file is accessed a minor page fault occurs and the system has to search the OS page cache for the correct frame to associate with the virtual memory address. In other words, it slows down subsequent database reads.

On Linux, we can use the non-standard mremap() system call instead of munmap()/mmap() to resize the mapping. This seems to avoid the minor page faults.

QUESTION: How should this be dealt with on other systems, like OSX, that do not have mremap()?

We have two ideas at present. And a question regarding each:

1) Create mappings larger than the database file. Then, when extending the database file, simply call ftruncate() to extend the file on disk and continue using the same mapping.

This would be ideal, and seems to work in practice. However, we're worried about this warning in the man page:

"The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified."

QUESTION: Is this something we should be worried about? Or an anachronism at this point?

2) When extending the database file, use the first argument to mmap() to request a mapping corresponding to the new pages of the database file located immediately after the current mapping in virtual memory. Effectively extending the initial mapping. If the system can't honour the request to place the new mapping immediately after the first, fall back to munmap/mmap.

In practice, we've found that OSX is pretty good about positioning mappings in this way, so this trick works there.

QUESTION: if the system does allocate the second mapping immediately following the first in virtual memory, is it then safe to eventually unmap them both using a single big call to munmap()?

915

asked Mar 28 '13 14:03

Dan Kennedy

1 Answers

2 will work but you don't have to rely on the OS happening to have space available, you can reserve your address space beforehand so your fixed mmapings will always succeed.

For instance, To reserve one gigabyte of address space. Do a

mmap(NULL, 1U << 30, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);

Which will reserve one gigabyte of continuous address space without actually allocating any memory or resources. You can then perform future mmapings over this space and they will succeed. So mmap the file into the beginning of the space returned, then mmap further sections of the file as needed using the fixed flag. The mmaps will succeed because your address space is already allocated and reserved by you.

Note: linux also has the MAP_NORESERVE flag which is the behavior you would want for the initial mapping if you were allocating RAM, but in my testing it is ignored as PROT_NONE is sufficient to say you don't want any resources allocated yet.

193

answered Oct 09 '22 04:10

John Meacham

Related questions
                            
                                Pipe string to GNU Date for conversion - how to make it read from stdin?
                            
                                Installing jdk8 on ubuntu- "unable to locate package" update doesn't fix
                            
                                '"SDL.h" no such file or directory found' when compiling
                            
                                How to get the current Linux process ID from the command line a in shell-agnostic, language-agnostic way
                            
                                Get the characters after the last index of a substring from a string
                            
                                How can I remove the BOM from a UTF-8 file? [duplicate]
                            
                                How to upgrade glibc from version 2.12 to 2.14 on CentOS?
                            
                                Sed remove tags from html file
                            
                                Linux: Merging multiple files, each on a new line
                            
                                `gcloud compute copy-files`: permission denied when copying files
                            
                                Hibernate + MySQL: How to set the encoding utf-8 for database and tables
                            
                                extracting text from MS word files in python
                            
                                What exactly does fork return?
                            
                                Sqlite DB Locked on Azure Dotnet Core Entity Framework
                            
                                PHP file_exists sometimes returns false for a file on CIFS share
                            
                                ftrace: system crash when changing current_tracer from function_graph via echo
                            
                                Optimizing animation performance in WebKit on Linux
                            
                                OAuth 2.0 on C++ (for UNIX)
                            
                                Can I force node.js require to be case sensitive?
                            
                                How can I capture network packets per PID?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to portably extend a file accessed using mmap()

Tags:

linux

macos

mmap

Dan Kennedy

People also ask

1 Answers

John Meacham

Recent Activity

Donate For Us