I have the following problem:
I allocate a large chunk of memory (multiple GiB) via mmap with MAP_ANONYMOUS. That chunk holds a large hash map which needs to be zeroed every now and then. Not the entire mapping may be used in each round (not every page is faulted in), so memset is not a good idea - takes too long.
What is the best strategy to do this quickly?
Will
madvise(ptr, length, MADV_DONTNEED);
guarantee me that any subsequent accesses provide new empty pages?
From the Linux man madvise page:
This call does not influence the semantics of the application (except in the case of MADV_DONTNEED), but may influence its performance. The kernel is free to ignore the advice.
...
MADV_DONTNEED
Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents from the underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
...
The current Linux implementation (2.4.0) views this system call more as a command than as advice ...
Or do I have to munmap and remap the region anew?
It has to work on Linux and ideally have the same behaviour on OS X.
There is a much easier solution to your problem that is fairly portable:
mmap(ptr, length, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
Since MAP_FIXED is permitted to fail for fairly arbitrary implementation-specific reasons, falling back to memset if it returns MAP_FAILED would be advisable.
On Linux, you can rely on MADV_DONTNEED on an anonymous mapping zeroing the mapping.  This isn't portable, though - madvise() itself isn't standardised.  posix_madvise() is standardised, but the POSIX_MADV_DONTNEED does not have the same behaviour as the Linux MADV_DONTNEED flag - posix_madvise() is always advisory, and does not affect the semantics of the application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With