Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the advantages of memory-mapped files?

I've been researching memory mapped files for a project and would appreciate any thoughts from people who have either used them before, or decided against using them, and why?

In particular, I am concerned about the following, in order of importance:

  • concurrency
  • random access
  • performance
  • ease of use
  • portability
like image 295
robottobor Avatar asked Oct 10 '08 18:10

robottobor


People also ask

What is the purpose of memory-mapped file?

A memory-mapped file contains the contents of a file in virtual memory. This mapping between a file and memory space enables an application, including multiple processes, to modify the file by reading and writing directly to the memory.

What are the advantages and disadvantages of supporting memory mapped IO to device control register?

The advantage of supporting memory-mapped I/O to device control registers is that it eliminates the need for special I/O instructions from the instruction set and therefore also does not require the enforcement of protection rules that prevent user programs from executing these I/O instructions.

What are the advantages and disadvantages of isolated versus memory-mapped I?

I/O operations in memory-mapped computers only use part of the full memory address, to make their location more distinctive. Isolated-memory systems are more complex because, unlike memory mapped computers, they can't use the same decoding and control systems for memory and I/O devices.


2 Answers

I think the advantage is really that you reduce the amount of data copying required over traditional methods of reading a file.

If your application can use the data "in place" in a memory-mapped file, it can come in without being copied; if you use a system call (e.g. Linux's pread() ) then that typically involves the kernel copying the data from its own buffers into user space. This extra copying not only takes time, but decreases the effectiveness of the CPU's caches by accessing this extra copy of the data.

If the data actually have to be read from the disc (as in physical I/O), then the OS still has to read them in, a page fault probably isn't any better performance-wise than a system call, but if they don't (i.e. already in the OS cache), performance should in theory be much better.

On the downside, there's no asynchronous interface to memory-mapped files - if you attempt to access a page which isn't mapped in, it generates a page fault then makes the thread wait for the I/O.


The obvious disadvantage to memory mapped files is on a 32-bit OS - you can easily run out of address space.

like image 69
MarkR Avatar answered Oct 02 '22 17:10

MarkR


I have used a memory mapped file to implement an 'auto complete' feature while the user is typing. I have well over 1 million product part numbers stored in a single index file. The file has some typical header information but the bulk of the file is a giant array of fixed size records sorted on the key field.

At runtime the file is memory mapped, cast to a C-style struct array, and we do a binary search to find matching part numbers as the user types. Only a few memory pages of the file are actually read from disk -- whichever pages are hit during the binary search.

  • Concurrency - I had an implementation problem where it would sometimes memory map the file multiple times in the same process space. This was a problem as I recall because sometimes the system couldn't find a large enough free block of virtual memory to map the file to. The solution was to only map the file once and thunk all calls to it. In retrospect using a full blown Windows service would of been cool.
  • Random Access - The binary search is certainly random access and lightning fast
  • Performance - The lookup is extremely fast. As users type a popup window displays a list of matching product part numbers, the list shrinks as they continue to type. There is no noticeable lag while typing.
like image 44
Brian Ensink Avatar answered Oct 02 '22 17:10

Brian Ensink