Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read file without evicting from OS page cache

(This is intended primary for Linux, or ideally any POSIX system.)

I'm looking for a way of reading a large number of files (any one of which might be up to 1GB by itself) with the following characteristics, as I read the pages in:

  • If the relevant disk page is already in the file system cache, that one is used.
  • If the relevant page isn't in the disk cache, it's fetched from disk but any existing cached disk pages are not evicted.

The idea is to be able to read all of these files without polluting the disk cache or evicting the current working set.

Any guidance?

like image 493
Christophe Avatar asked Feb 21 '23 03:02

Christophe


1 Answers

On Linux you can experiment with O_DIRECT open() flag. man open(2):

   O_DIRECT (Since Linux 2.4.10)
          Try  to minimize cache effects of the I/O to and from this file.
          In general this will degrade performance, but it  is  useful  in
          special  situations,  such  as  when  applications  do their own
          caching.  File I/O is done directly to/from user space  buffers.
          The O_DIRECT flag on its own makes at an effort to transfer data
          synchronously, but does not give the guarantees  of  the  O_SYNC
          that  data and necessary metadata are transferred.  To guarantee
          synchronous I/O the O_SYNC must be used in addition to O_DIRECT.
          See NOTES below for further discussion.
like image 104
Maxim Egorushkin Avatar answered Mar 05 '23 16:03

Maxim Egorushkin