Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does mmap work?

I am working on programs in Linux which needs mmap file from harddrive, but i have a question, what can make it fail. Like if all the memories are fragmented, which has only 200M each, but i want to mmap a file to a memory of 1000M, will it succeed??

And another question, are there any tools in linux for recollect memory like some tools in Windows, e.g. the built-in tool for xp.

Thanks.

like image 713
Shaobo Wang Avatar asked May 04 '11 02:05

Shaobo Wang


2 Answers

mmap() uses addresses outide your program's heap area, so heap fragmentation isn't a problem, except to the extent that it can make the heap take up more space, and reduce the available space for mappings.

If you have lots of mapped files, you could potentially run into problems with fragmentation on a 32-bit system where the address space is relatively constrained. On a 64-bit system, fragmentation is unlikely to be a problem because even if you have only small regions available between existing mappings, there's still lots and lots of available contiguous address space, adjacent to the existing mappings.

The more common problem on a 32-bit system is that the address space is just too small to map large files at all. Of the 4GB address space, typically 2GB is available to userspace, with the other 2GB being reserved by the kernel. Of that available 2GB, your mappings have to share space with the program's code and stacks (typically small) and heap (potentially large).

In short, mmap() can often fail on 32-bit systems if the file is too large, but you're unlikely to ever have a file large enough to cause that problem on a 64-bit system.

If you're creating a private copy-on-write mapping, it can also fail due to lack of swap space. The kernel has to ensure that the sum of available RAM and swap is large enough to hold the size of your mapping, in case you modify all the pages so that the kernel is forced to make private copies of them all. A shared mapping shouldn't have this problem, since changes can be flushed to the file on disk, and then the pages can be discarded if memory is scarce and reloaded from disk later.

Of course, a mapping can also fail if you don't have permission to access the file, or if it's not a type of file that can be mapped (such as a directory or a socket).

It's not clear what you mean about recollecting memory. Remember that the scarce resource that mmap() consumes isn't memory, it's address space. You can potentially map a 1GB file even if the machine actually only has 128MB of RAM, but on a 32-bit system you can't map a 4GB file even if the machine has 16GB of RAM.

The concept of virtual memory is essential to understanding what mmap() does, so read about that if you're not familiar with it already.

like image 115
Wyzard Avatar answered Oct 13 '22 01:10

Wyzard


mmap works by manipulating your process's page table, a data structure your CPU uses to map address spaces. The CPU will translate "virtual" addresses to "physical" ones, and does so according to the page table set up by your kernel.

When you access the mapped memory for the first time, your CPU generates a page fault. The OS kernel can then jump in, "fix up" the invalid memory access by allocating memory and doing file I/O in that newly allocated buffer, then continue your program's execution as if nothing happened.

mmap can fail if your process is out of address space, something to watch out for these days for 32-bit code, where all usable address can be mapped pretty quickly with large data sets. It can also fail for any of the things mentioned in the "Errors" section of the manpage.

Accessing memory inside a mapped region can also fail if the kernel has issues allocating memory or doing I/O. In that case your process will get a SIGBUS signal.

like image 40
asveikau Avatar answered Oct 13 '22 00:10

asveikau