I'm trying to figure out if mmap
'ing a file, and then using madvise()
or posix_madvise()
with MADV_WILLNEED
/POSIX_MADV_WILLNEED
actually triggers background async I/O for read-ahead. The man pages for madvise
don't specify whether this is the case - the actual behavior of madvise
is left mostly unclear, in order to allow for flexibility of the implementation.
But does any actual mainstream POSIX implementation (like Linux) actually perform async file I/O when madvise()
with MADV_WILLNEED
is called? I can't seem to get any reliable information about this. This question suggests it does, on Linux at least, even if it is not ideal since there is no callback mechanism.
This book excerpt claims that posix_fadvise
with POSIX_FADV_WILLNEED
will do asynchronous read ahead, but doesn't mention if madvise()
does async read ahead.
Furthermore, it would seem that the whole concept of "read-ahead" I/O doesn't really make any sense unless it's asynchronous. If it was synchronous, it simply makes the user application block for the read-ahead, instead of later when actually reading the file, which doesn't seem like a particularly powerful optimization.
So, does madvise()
with MADV_WILLNEED
actually do async read-ahead on any mainstream platform (like Linux)?
This approach often results in improved I/O performance because it avoids many costly system calls and reduces expensive data buffer transfers. The mmap API is similar to the regular file I/O API, so it’s fairly straightforward to test out.
Python’s mmap provides memory-mapped file input and output (I/O). It allows you to take advantage of lower-level operating system functionality to read files as if they were one large string or array.
Speaking of pickling, it’s worth pointing out that mmap is incompatible with higher-level, more full-featured APIs like the built-in multiprocessing module. The multiprocessing module requires data passed between processes to support the pickle protocol, which mmap does not.
In most situations there is no need for asynchronous I/O, since its effects can be achieved with the use of threads, with each thread doing synchronous I/O. However, in a few situations, threads cannot achieve what asynchronous I/O can.
With Linux you can always check the source code.
See fadvise.c:
case POSIX_FADV_WILLNEED:
...
force_page_cache_readahead(mapping, f.file, start_index,
nrpages);
break;
So posix_fadvise
calls force_page_cache_readahead
to perform readahead.
Now lets look at madvise.c:
static long madvise_willneed(...)
{
...
force_page_cache_readahead(file->f_mapping, file, start, end-start);
return 0;
}
So MADV_WILLNEED
and POSIX_FADV_WILLNEED
are equivalent on Linux.
Can this be called asynchronous I/O? I don't think so. Async IO usually implies there is some notification that allows you to retrieve data. Advise is just an advise: not only you do not know when data is ready to be read, but if you are too late, data might be thrown away already.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With