Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the status of POSIX asynchronous I/O (AIO)?

There are pages scattered around the web that describe POSIX AIO facilities in varying amounts of detail. None of them are terribly recent. It's not clear what, exactly, they're describing. For example, the "official" (?) web site for Linux kernel asynchronous I/O support here says that sockets don't work, but the "aio.h" manual pages on my Ubuntu 8.04.1 workstation all seem to imply that it works for arbitrary file descriptors. Then there's another project that seems to work at the library layer with even less documentation.

I'd like to know:

  • What is the purpose of POSIX AIO? Given that the most obvious example of an implementation I can find says it doesn't support sockets, the whole thing seems weird to me. Is it just for async disk I/O? If so, why the hyper-general API? If not, why is disk I/O the first thing that got attacked?
  • Where are there example complete POSIX AIO programs that I can look at?
  • Does anyone actually use it, for real?
  • What platforms support POSIX AIO? What parts of it do they support? Does anyone really support the implied "Any I/O to any FD" that <aio.h> seems to promise?

The other multiplexing mechanisms available to me are perfectly good, but the random fragments of information floating around out there have made me curious.

like image 949
Glyph Avatar asked Sep 17 '08 21:09

Glyph


People also ask

What is AIO in Linux?

Linux asynchronous I/O is a relatively recent addition to the Linux kernel. It's a standard feature of the 2.6 kernel, but you can find patches for 2.4. The basic idea behind AIO is to allow a process to initiate a number of I/O operations without having to block or wait for any to complete.

How does Async IO work in Linux?

Using asynchronous I/O is quite simple. The application opens the file by means of the usual open( ) system call. Then, it fills up a control block of type struct aiocb with the information describing the requested operation.

What is AIO in C?

Asynchronous I/O (AIO) is a method for performing I/O operations so that the process that issued an I/O request is not blocked till the operation is complished. Instead, after an I/O request is submitted, the process continues to execute its code and can later check the status of the submitted request.

What is Linux Io_uring?

Put simply, io_uring is a system call interface for Linux. It was first introduced in upstream Linux Kernel version 5.1 in 2019 [1]. It enables an application to initiate system calls that can be performed asynchronously.


3 Answers

Doing socket I/O efficiently has been solved with kqueue, epoll, IO completion ports and the likes. Doing asynchronous file I/O is sort of a late comer (apart from windows' overlapped I/O and solaris early support for posix AIO).

If you're looking for doing socket I/O, you're probably better off using one of the above mechanisms.

The main purpose of AIO is hence to solve the problem of asynchronous disk I/O. This is most likely why Mac OS X only supports AIO for regular files, and not sockets (since kqueue does that so much better anyway).

Write operations are typically cached by the kernel and flushed out at a later time. For instance when the read head of the drive happens to pass by the location where the block is to be written.

However, for read operations, if you want the kernel to prioritize and order your reads, AIO is really the only option. Here's why the kernal can (theoretically) do that better than any user level application:

  • The kernel sees all disk I/O, not just your applications disk jobs, and can order them at a global level
  • The kernel (may) know where the disk read head is, and can pick the read jobs you pass on to it in optimal order, to move the head the shortest distance
  • The kernel can take advantage of native command queuing to optimize your read operations further
  • You may be able to issue more read operations per system call using lio_listio() than with readv(), especially if your reads are not (logically) contiguous, saving a tiny bit of system call overhead.
  • Your program might be slightly simpler with AIO since you don't need an extra thread to block in a read or write call.

That said, posix AIO has a quite awkward interface, for instance:

  • The only efficient and well supported mean of event callbacks are via signals, which makes it hard to use in a library, since it means using signal numbers from the process-global signal namespace. If your OS doesn't support realtime signals, it also means you have to loop through all your outstanding requests to figure out which one actually finished (this is the case for Mac OS X for instance, not Linux). Catching signals in a multi-threaded environment also makes for some tricky restrictions. You can typically not react to the event inside the signal handler, but you have to raise a signal, write to a pipe or use signalfd() (on linux).
  • lio_suspend() has the same issues as select() does, it doesn't scale very well with the number of jobs.
  • lio_listio(), as implemented has fairly limited number of jobs you can pass in, and it's not trivial to find this limit in a portable way. You have to call sysconf(_SC_AIO_LISTIO_MAX), which may fail, in which case you can use the AIO_LISTIO_MAX define, which are not necessarily defined, but then you can use 2, which is defined as guaranteed to be supported.

As for real-world application using posix AIO, you could take a look at lighttpd (lighty), which also posted a performance measurement when introducing support.

Most posix platforms supports posix AIO by now (Linux, BSD, Solaris, AIX, tru64). Windows supports it via its overlapped file I/O. My understanding is that only Solaris, Windows and Linux truly supports async. file I/O all the way down to the driver, whereas the other OSes emulate the async. I/O with kernel threads. Linux being the exception, its posix AIO implementation in glibc emulates async operations with user level threads, whereas its native async I/O interface (io_submit() etc.) are truly asynchronous all the way down to the driver, assuming the driver supports it.

I believe it's fairly common among OSes to not support posix AIO for any fd, but restrict it to regular files.

like image 167
Arvid Avatar answered Nov 01 '22 01:11

Arvid


Network I/O is not a priority for AIO because everyone writing POSIX network servers uses an event based, non-blocking approach. The old-style Java "billions of blocking threads" approach sucks horribly.

Disk write I/O is already buffered and disk read I/O can be prefetched into buffer using functions like posix_fadvise. That leaves direct, unbuffered disk I/O as the only useful purpose for AIO.

Direct, unbuffered I/O is only really useful for transactional databases, and those tend to write their own threads or processes to manage their disk I/O.

So, at the end that leaves POSIX AIO in the position of not serving any useful purpose. Don't use it.

like image 42
Zan Lynx Avatar answered Nov 01 '22 02:11

Zan Lynx


A libtorrent developer provides a report on this: http://blog.libtorrent.org/2012/10/asynchronous-disk-io/

like image 12
Allen Avatar answered Nov 01 '22 00:11

Allen