Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a memory mapped file work for files larger than memory?

I'm trying to work with data file that is larger than my memory.

My understanding so far is that it maps every byte in the file to an address in virtual memory. The data is only read to the real memory when you actually need it (for example accessing a specific entry), and it is read in chunks that are called pages.

But if I'm eventually going to process everything in that data file, doesn't that mean that everything needs to be read into the real memory eventually? Does the OS automatically decide which parts of the data already in memory to be freed to make room for extra data?

For this specific project I'm working with Python on Linux if that makes any difference. numpy.memmap

like image 838
cactus Avatar asked Dec 22 '16 17:12

cactus


People also ask

How does memory-mapped file work?

A memory-mapped file contains the contents of a file in virtual memory. This mapping between a file and memory space enables an application, including multiple processes, to modify the file by reading and writing directly to the memory.

How big are memory-mapped files?

Memory-mapped files cannot be larger than 2GB on 32-bit systems. When a memmap causes a file to be created or extended beyond its current size in the filesystem, the contents of the new part are unspecified.

Are memory-mapped files faster?

Accessing memory mapped files is faster than using direct read and write operations for two reasons. Firstly, a system call is orders of magnitude slower than a simple change to a program's local memory.

When you write to a memory-mapped file when is the file actually written to the disk?

A memory mapped file is actually partially or wholly mapped in memory (RAM), whereas a file you write to would be written to memory and then flushed to disk. A memory mapped file is taken from disk and placed into memory explicitly for reading and/or writing. It stays there until you unmap it.


1 Answers

It depends.

Memory-mapped files work in almost exactly the same way as traditional paging works, except that instead of moving data between memory and the pagefile, the operating system moves data between memory and some arbitrary file that you specify.

So if you run out of physical memory (that is, the actual RAM chips that you have on your motherboard), that's fine. The operating system will just page out whichever parts of the file it thinks you're not going to use. If it guesses wrong, you'll have poor performance, but you won't crash or anything.

But if you run out of virtual memory, or address space, that's not fine. In this case, your program runs out of memory addresses and will no longer be able to allocate memory. You will also be unable to grow the memory-mapped region of the file. For a 32-bit program, the limit is somewhat smaller than 4 GB (the precise limit varies by operating system and programming environment, and depends on the overhead of those systems). For a 64-bit program, the limit is normally huge, though exactly how huge will depend on your architecture and operating system.

like image 122
Kevin Avatar answered Nov 14 '22 22:11

Kevin