Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In h5py, do I need to call flush() before I close a file?

Tags:

python

h5py

In the Python HDF5 library h5py, do I need to flush() a file before I close() it?

Or does closing the file already make sure that any data that might still be in the buffers will be written to disk?

What exactly is the point of flushing? When would flushing be necessary?

like image 941
Alex Avatar asked Feb 09 '18 18:02

Alex


People also ask

How do I close h5py?

Closing files If you call File. close() , or leave a with h5py. File(...) block, the file will be closed and any objects (such as groups or datasets) you have from that file will become unusable.

Does file close flush?

In Python, files are automatically flushed while closing them. However, a programmer can flush a file before closing it by using the flush() method.


1 Answers

No, you do not need to flush the file before closing. Flushing is done automatically by the underlying HDF5 C library when you close the file.


As to the point of flushing. File I/O is slow compared to things like memory or cache access. If programs had to wait before data was actually on the disk each time a write was performed, that would slow things down a lot. So the actual writing to disk is buffered by at least the OS, but in many cases by the I/O library being used (e.g., the C standard I/O library). When you ask to write data to a file, it usually just means that the OS has copied your data to its own internal buffer, and will actually put it on the disk when it's convenient to do so.

Flushing overrides this buffering, at whatever level the call is made. So calling h5py.File.flush() will flush the HDF5 library buffers, but not necessarily the OS buffers. The point of this is to give the program some control over when data actually leaves a buffer.

For example, writing to the standard output is usually line-buffered. But if you really want to see the output before a newline, you can call fflush(stdout). This might make sense if you are piping the standard output of one process into another: that downstream process can start consuming the input right away, without waiting for the OS to decide it's a good time.

Another good example is making a call to fork(2). This usually copies the entire address space of a process, which means the I/O buffers as well. That may result in duplicated output, unnecessary copying, etc. Flushing a stream guarantees that the buffer is empty before forking.

like image 191
bnaecker Avatar answered Oct 07 '22 02:10

bnaecker