Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is difference between io_submit and file with O_ASYNC

I am reading this tutorial on asynchronous disk file I/O, however it doesn't make things clear, it actually makes me more confused.

There are two different async. I/O models according to the tutorial:

  1. Asynchronous blocking I/O where you open a file with O_ASYNC, then use epoll/poll/select.

  2. Asynchronous IO using glibc's AIO

Since glibc implements AIO with a thread pool, what I am referring to in this question with "AIO" is rather kernel AIO, i.e. io_submit

At least from a conceptual point of view, there seems to be no big difference -- io_submit can let you issue multiple I/O requests, while on the other hand, using read with O_ASYNC you can just issue one request with a file position.

This guide also mentions using epoll as an alternative to Linux AIO:

epoll. Linux has limited support for using epoll as a mechanism for asynchronous I/O. For reads to a file opened in buffered mode (that is, without O_DIRECT), if the file is opened as O_NONBLOCK, then a read will return EAGAIN until the relevant part is in memory. Writes to a buffered file are usually immediate, as they are written out with another writeback thread. However, these mechanisms don’t give the level of control over I/O that direct I/O gives.

What is the issue of using epoll as an AIO alternative? Or in other words, what is the problem that we need [the new interface] io_submit to solve?

like image 675
Chang Avatar asked May 05 '13 23:05

Chang


People also ask

What is Iocb in Linux?

The iocb (I/O control block) structure defined in linux/aio_abi. h defines the parameters that control the I/O operation.

What is Linux Io_uring?

Put simply, io_uring is a system call interface for Linux. It was first introduced in upstream Linux Kernel version 5.1 in 2019 [1]. It enables an application to initiate system calls that can be performed asynchronously.


1 Answers

To my opinion, the critical issue behind the io_* api is the ability to achieve higher IO throughput through 2 main measures:

  1. Minimization of number of system calls in the application IO loop. Multiple request batches can be submitted, then, at some later time, application can return to examine the outcomes of individual requests in one go using io_getevents(). Importantly, io_getevents() will return information on each individual IO transaction, rather than a vague "fd x has pending changes" bit of info returned by epoll() on each invocation.

  2. Kernel IO scheduler can rely on request reordering to make better use of the hardware. Application may even pass down some tips on how to reorder the requests using aio_reqprio field in struct iocb. Necessarily, if we allow reordering of IO requests, we need to supply an application with appropriate API to query, whether some particular high priority requests are already completed (thus io_getevents()).

It can be said, that io_getevents() is the really important piece of functionality, whereupon io_submit() is a handy companion to make efficient use of it.

like image 71
oakad Avatar answered Oct 05 '22 11:10

oakad