Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is mmap + madvise really a form of async I/O?

I'm trying to figure out if mmap'ing a file, and then using madvise() or posix_madvise() with MADV_WILLNEED/POSIX_MADV_WILLNEED actually triggers background async I/O for read-ahead. The man pages for madvise don't specify whether this is the case - the actual behavior of madvise is left mostly unclear, in order to allow for flexibility of the implementation.

But does any actual mainstream POSIX implementation (like Linux) actually perform async file I/O when madvise() with MADV_WILLNEED is called? I can't seem to get any reliable information about this. This question suggests it does, on Linux at least, even if it is not ideal since there is no callback mechanism.

This book excerpt claims that posix_fadvise with POSIX_FADV_WILLNEED will do asynchronous read ahead, but doesn't mention if madvise() does async read ahead.

Furthermore, it would seem that the whole concept of "read-ahead" I/O doesn't really make any sense unless it's asynchronous. If it was synchronous, it simply makes the user application block for the read-ahead, instead of later when actually reading the file, which doesn't seem like a particularly powerful optimization.

So, does madvise() with MADV_WILLNEED actually do async read-ahead on any mainstream platform (like Linux)?

like image 688
Siler Avatar asked Jul 03 '15 23:07

Siler


People also ask

Does mmap improve I/O performance?

This approach often results in improved I/O performance because it avoids many costly system calls and reduces expensive data buffer transfers. The mmap API is similar to the regular file I/O API, so it’s fairly straightforward to test out.

What is MMAP in Python?

Python’s mmap provides memory-mapped file input and output (I/O). It allows you to take advantage of lower-level operating system functionality to read files as if they were one large string or array.

Is mmap compatible with multiprocessing?

Speaking of pickling, it’s worth pointing out that mmap is incompatible with higher-level, more full-featured APIs like the built-in multiprocessing module. The multiprocessing module requires data passed between processes to support the pickle protocol, which mmap does not.

Is there a need for asynchronous I/O?

In most situations there is no need for asynchronous I/O, since its effects can be achieved with the use of threads, with each thread doing synchronous I/O. However, in a few situations, threads cannot achieve what asynchronous I/O can.


1 Answers

With Linux you can always check the source code.

See fadvise.c:

case POSIX_FADV_WILLNEED:
    ...
    force_page_cache_readahead(mapping, f.file, start_index,
                   nrpages);
    break;

So posix_fadvise calls force_page_cache_readahead to perform readahead.

Now lets look at madvise.c:

static long madvise_willneed(...)
{
    ...
    force_page_cache_readahead(file->f_mapping, file, start, end-start);
    return 0;
}

So MADV_WILLNEED and POSIX_FADV_WILLNEED are equivalent on Linux.

Can this be called asynchronous I/O? I don't think so. Async IO usually implies there is some notification that allows you to retrieve data. Advise is just an advise: not only you do not know when data is ready to be read, but if you are too late, data might be thrown away already.

like image 162
StaceyGirl Avatar answered Oct 02 '22 08:10

StaceyGirl