I have a problem that was described in multiple threads concerning memory mapping and a growing memory consumption under Linux.
When I open a 1GB file under Linux or MacOS X and map it into memory using
me.data_begin = mmap(NULL, capacity(me), prot, MAP_SHARED, me.file.handle, 0);
and sequentially read the mapped memory, my program uses more and more physical memory although I used posix_madvise (even called it multiple times during the read process):
posix_madvise(me.data_begin, capacity(me), MMAP_SEQUENTIAL);
without success. :-(
I tried:
It works under Mac OS X !!! when I combine
posix_madvise(.. MMAP_SEQUENTIAL)
and
msync(me.data_begin, capacity(me), MS_INVALIDATE).
The resident memory is below 16M (I periodically called msync after 16mio steps).
But under Linux nothing works. Does anyone has an idea or a success story for my problem under Linux?
Cheers, David
Memory mapped files are loaded into memory one entire page at a time. The page size is selected by the operating system for maximum performance.
To map virtual memory addresses to physical memory addresses, page tables are used. A page table consists of numerous page table entries (PTE). One memory page in a PTE contains data structures consisting of different sizes of 'words'.
The principal benefits of memory-mapping are efficiency, faster file access, the ability to share memory between applications, and more efficient coding.
Linux memory management is different from other systems. The key principle is that memory that is not being used is memory being wasted. In many ways, Linux tries to maximize memory usage, resulting (most of the time) in better performance.
It is not that "nothing works" in Linux, but that its behavior is a little different than you expect.
When memory pages are pulled from the mmapped file, the operating system has to decide which physical memory pages it will release (or swap out) in order to use. It will look for pages which are easier to swap out (don't require immediate disk write) and are less likely to be used again.
The madvice() POSIX call serves to tell the system how your application will use the pages. But as the name says, it is an advice so that the operating system is better instrumented in taking paging and swapping decisions. It is neither a policy nor an order.
To demonstrate the effects of madvice() on Linux, I modified one of the exercises I give to my students. See the complete source code here. My system is 64-bit and has 2 GB of RAM, which about 50% is in use now. Using the program to mmap a 2 GB file, read it sequentially and discard everything. It reports RSS usage every 200 MB is read. The results without madvice():
<juliano@home> ~% ./madvtest file.dat n
0 : 3 MB
200 : 202 MB
400 : 402 MB
600 : 602 MB
800 : 802 MB
1000 : 1002 MB
1200 : 1066 MB
1400 : 1068 MB
1600 : 1078 MB
1800 : 1113 MB
2000 : 1113 MB
Linux kept pushing things out of memory until around 1 GB was read. After that, it started pressuring the process itself (since the other 50% of memory was active by the other processes) and stabilized until the end of the file.
Now, with madvice():
<juliano@home> ~% ./madvtest file.dat y
0 : 3 MB
200 : 202 MB
400 : 402 MB
600 : 494 MB
800 : 501 MB
1000 : 518 MB
1200 : 530 MB
1400 : 530 MB
1600 : 530 MB
1800 : 595 MB
2000 : 788 MB
Note that Linux decided to allocate pages to the process only until it reached around 500 MB, much sooner than without madvice(). This is because after that, the pages currently in memory seemed much more valuable than the pages that were marked as sequential access by this process. There is a threshold in the VMM that defines when to start dropping old pages from the proccess.
You may ask why Linux kept allocating pages up to around 500 MB and didn't stop much sooner, since they were marked as sequential access. It is that either the system had enough free memory pages anyways, or the other resident pages were too old to keep around. Between keeping ancient pages in memory that don't seem to be useful anymore, and bringing more pages to serve a program that is running now, Linux chooses the second option.
Even if they were marked as sequential access, it was just an advice. The application may still want to go back to those pages and read them again. Or another application in the system. The madvice() call says only what the application itself is doing, Linux takes in consideration the bigger picture.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With