Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What posix_fadvise() args for sequential file write?

Tags:

I am working on an application which does sequentially write a large file (and does not read at all), and I would like to use posix_fadvise() to optimize the filesystem behavior.

The function description in the manpage suggests that the most appropriate strategy would be POSIX_FADV_SEQUENTIAL. However, the Linux implementation description doubts that:

Under Linux, POSIX_FADV_NORMAL sets the readahead window to the default size for the backing device; POSIX_FADV_SEQUENTIAL doubles this size, and POSIX_FADV_RANDOM disables file readahead entirely.

As I'm only writing data (overwriting files possibly too), I don't expect any readahead. Should I then stick with my POSIX_FADV_SEQUENTIAL or rather use POSIX_FADV_RANDOM to disable it?

How about other options, such as POSIX_FADV_NOREUSE? Or maybe do not use posix_fadvise() for writing at all?

like image 471
Michał Górny Avatar asked Sep 20 '10 21:09

Michał Górny


People also ask

What is posix_fadvise?

The posix_fadvise() function shall advise the implementation on the expected behavior of the application with respect to the data in the file associated with the open file descriptor, fd, starting at offset and continuing for len bytes.

What is fadvise in Linux?

Programs can use posix_fadvise() to announce an intention to access file data in a specific pattern in the future, thus allowing the kernel to perform appropriate optimizations.


2 Answers

Most of the posix_fadvise() flags (eg POSIX_FADV_SEQUENTIAL and POSIX_FADV_RANDOM) are hints about readahead rather than writing.

There's some advice from Linus here and here about getting good sequential write performance. The idea is to break the file into large-ish (8MB) windows, then loop around doing:

  • Write out window N with write();
  • Request asynchronous write-out of window N with sync_file_range(..., SYNC_FILE_RANGE_WRITE)
  • Wait for the write-out of window N-1 to complete with sync_file_range(..., SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WAIT_AFTER)
  • Drop window N-1 from the pagecache with posix_fadvise(..., POSIX_FADV_DONTNEED)

This way you never have more than two windows worth of data in the page cache, but you still get the kernel writing out part of the pagecache to disk while you fill the next part.

like image 186
caf Avatar answered Oct 03 '22 11:10

caf


It all depends on the temporal locality of your data. If your application won't need the data soon after it was written, then you can go with POSIX_FADV_NOREUSE to avoid writing to the buffer cache (in a similar way as the O_DIRECT flag from open()).

like image 38
Trixl Avatar answered Oct 03 '22 10:10

Trixl