Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Ruby and OS I/O buffering

How does IO buffering work in Ruby? How often is data flushed to the underlying stream when using the IO and File classes? How does this compare to OS buffering? What needs to be done to guarantee that given data has been written to disk, before confidently reading it back for processing?

like image 392
jrdioko Avatar asked Jul 14 '11 23:07

jrdioko


People also ask

What is buffering in I O?

I/O buffering The process of temporarily storing data that is passing between a processor and a peripheral. The usual purpose is to smooth out the difference in rates at which the two devices can handle data. A Dictionary of Computing. "I/O buffering ."

Why does buffering I O data help performance?

The use of two buffers disintegrates the producer and the consumer of the data, thus minimizes the time requirements between them. Buffering also provides variations for devices that have different data transfer sizes.


2 Answers

The Ruby IO documentation is not 100% clear on how this buffering works, but this is what you can extract from the documentation:

  • Ruby IO has its own internal buffer
  • In addition to that the underlying operating system may or may not further buffer data.

The relevant methods to look at:

  • IO.flush: Flushes IO. I also looked at the Ruby source and a call to IO.flush also calls the underlying OS fflush(). This should be enough to get the file cached, but does not guarantee physical data to disk.
  • IO.sync=: If set to true, no Ruby internal buffering is done. Everything is immidiately sent to the OS, and fflush() is called for each write.
  • IO.sync: Returns the current sync setting (true or false).
  • IO.fsync: Flushes both the Ruby buffers + calls fsync() on the OS (if it supports it). This will guarantee a full flush all the way to the physical disk file.
  • IO.close: Closes the Ruby IO and writes pending data to the OS. Note that this does not imply fsync(). The POSIX documentation on close() says that it does NOT guarantee data is physically written to the file. So you need to use an explicit fsync() call for that.

Conclusion: flush and/or close should be enough to get the file cached so that it can be read fully by another process or operation. To get the file all the way to the physical media with certainty, you need to call IO.fsync.

Other related methods:

  • IO.syswrite: Bypass Ruby internal buffers and do a straight OS write. If you use this then do not mix it with IO.read/write.
  • IO.sysread: Same as above, but for reading.
like image 115
Casper Avatar answered Oct 05 '22 20:10

Casper


Ruby does its internal buffering on top of the OS. When you do file.flush Ruby flushes its internal buffer. To ensure the file is written to disk you need to do file.fsync. But in the end you can not be certain the file is written to disk anyway, it depends on the OS, the hdd controller and the hdd.

like image 44
Björn Nilsson Avatar answered Oct 05 '22 18:10

Björn Nilsson