Does madvise(___, ___, MADV_DONTNEED) instruct the OS to lazily write to disk?

Question

Hypothetically, suppose I want to perform sequential writing to a potentially very large file.

If I mmap() a gigantic region and madvise(MADV_SEQUENTIAL) on that entire region, then I can write to the memory in a relatively efficient manner. This I have gotten to work just fine.

Now, in order to free up various OS resources as I am writing, I occasionally perform a munmap() on small chunks of memory that have already been written to. My concern is that munmap() and msync()will block my thread, waiting for the data to be physically committed to disk. I cannot slow down my writer at all, so I need to find another way.

Would it be better to use madvise(MADV_DONTNEED) on the small, already-written chunk of memory? I want to tell the OS to write that memory to disk lazily, and not to block my calling thread.

The manpage on madvise() has this to say, which is rather ambiguous:

MADV_DONTNEED
Do  not expect access in the near future.  (For the time being, the 
application is finished with the given range, so the kernel can free
resources associated with it.)  Subsequent accesses of pages in this
range will succeed, but will result either in re-loading  of the memory
contents from the underlying mapped file (see mmap(2)) or
zero-fill-on-demand pages for mappings without an underlying file.

Damon · Accepted Answer

No!

For your own good, stay away from MADV_DONTNEED. Linux will not take this as a hint to throw pages away after writing them back, but to throw them away immediately. This is not considered a bug, but a deliberate decision.

Ironically, the reasoning is that the functionality of a non-destructive MADV_DONTNEED is already given by msync(MS_INVALIDATE|MS_ASYNC), MS_ASYNC on the other hand does not start I/O (in fact, it does nothing at all, following the reasoning that dirty page writeback works fine anyway), fsync always blocks, and sync_file_range may block if you exceed some obscure limit and is considered "extremely dangerous" by the documentation, whatever that means.

Either way, you must msync(MS_SYNC), or fsync (both blocking), or sync_file_range (possibly blocking) followed by fsync, or you will lose data with MADV_DONTNEED. If you cannot afford to possibly block, you have no choice, sadly, but to do this in another thread.

cyfdecyf · Answer

For recent Linux kernels (just tested on Linux 5.4), MADV_DONTNEED works as expected when the mapping is NOT private, e.g. mmap without MAP_PRIVATE flag. I'm not sure what's the behavior on previous versions of Linux kernel.

From release 4.15 of the Linux man-pages project's madvise manpage:

After a successful MADV_DONTNEED operation, the semantics of memory access in the specified region are changed: subsequent accesses of pages in the range will succeed, but will result in either repopulating the memory contents from the up-to-date contents of the underlying mapped file (for shared file mappings, shared anonymous mappings, and shmem-based techniques such as System V shared memory segments) or zero-fill-on-demand pages for anonymous private mappings.

Linux added a new flag MADV_FREE with the same behavior in BSD systems in Linux 4.5

which just mark pages as available to free if needed, but it doesn't free them immediately, making possible to reuse the memory range without incurring in the costs of faulting the pages again.

For why MADV_DONTNEED for private mapping may result zero filled pages upon future access, watch Bryan Cantrill's rant as mentioned in comments of @Damon's answer. Spoiler: it comes from Tru64 UNIX.

mbloms · Answer

As already mentioned, MADV_DONTNEED is not your friend. Since Linux 5.4, you can use MADV_COLD to tell the kernel it should page out that memory when there is memory pressure. This seems to be exactly what is wanted in this situation.

Read more here: https://lwn.net/Articles/793462/

Does madvise(_, _, MADV_DONTNEED) instruct the OS to lazily write to disk?

Tags:

linux

linux-kernel

kernel

mmap

shared-memory

Badmanchild

3 Answers

No!

Damon

cyfdecyf

mbloms

Recent Activity

Donate For Us