In the Python HDF5 library h5py
, do I need to flush()
a file before I close()
it?
Or does closing the file already make sure that any data that might still be in the buffers will be written to disk?
What exactly is the point of flushing? When would flushing be necessary?
Closing files If you call File. close() , or leave a with h5py. File(...) block, the file will be closed and any objects (such as groups or datasets) you have from that file will become unusable.
In Python, files are automatically flushed while closing them. However, a programmer can flush a file before closing it by using the flush() method.
No, you do not need to flush the file before closing. Flushing is done automatically by the underlying HDF5 C library when you close the file.
As to the point of flushing. File I/O is slow compared to things like memory or cache access. If programs had to wait before data was actually on the disk each time a write was performed, that would slow things down a lot. So the actual writing to disk is buffered by at least the OS, but in many cases by the I/O library being used (e.g., the C standard I/O library). When you ask to write data to a file, it usually just means that the OS has copied your data to its own internal buffer, and will actually put it on the disk when it's convenient to do so.
Flushing overrides this buffering, at whatever level the call is made. So calling h5py.File.flush()
will flush the HDF5 library buffers, but not necessarily the OS buffers. The point of this is to give the program some control over when data actually leaves a buffer.
For example, writing to the standard output is usually line-buffered. But if you really want to see the output before a newline, you can call fflush(stdout)
. This might make sense if you are piping the standard output of one process into another: that downstream process can start consuming the input right away, without waiting for the OS to decide it's a good time.
Another good example is making a call to fork(2)
. This usually copies the entire address space of a process, which means the I/O buffers as well. That may result in duplicated output, unnecessary copying, etc. Flushing a stream guarantees that the buffer is empty before forking.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With