In the Python HDF5 library <code>h5py</code>, do I need to <code>flush()</code> a file before I <code>close()</code> it? Or does closing the file already make sure that any data that might still be in the buffers will be written to disk? What exactly is the point of flushing? When would flushing be necessary?

No, you do not need to flush the file before closing. Flushing is done automatically by the underlying HDF5 C library when you close the file. <hr> As to the point of flushing. File I/O is slow compared to things like memory or cache access. If programs had to wait before data was actually on the disk each time a write was performed, that would slow things down a lot. So the actual writing to disk is buffered by at least the OS, but in many cases by the I/O library being used (e.g., the C standard I/O library). When you ask to write data to a file, it usually just means that the OS has copied your data to its own internal buffer, and will actually put it on the disk when it's convenient to do so. Flushing overrides this buffering, at whatever level the call is made. So calling <code>h5py.File.flush()</code> will flush the HDF5 library buffers, but not necessarily the OS buffers. The point of this is to give the program some control over when data actually leaves a buffer. For example, writing to the standard output is usually line-buffered. But if you really want to see the output before a newline, you can call <code>fflush(stdout)</code>. This might make sense if you are piping the standard output of one process into another: that downstream process can start consuming the input right away, without waiting for the OS to decide it's a good time. Another good example is making a call to <code>fork(2)</code>. This usually copies the entire address space of a process, which means the I/O buffers as well. That may result in duplicated output, unnecessary copying, etc. Flushing a stream guarantees that the buffer is empty before forking.

In h5py, do I need to call flush() before I close a file?

1 Answers

No, you do not need to flush the file before closing. Flushing is done automatically by the underlying HDF5 C library when you close the file.

As to the point of flushing. File I/O is slow compared to things like memory or cache access. If programs had to wait before data was actually on the disk each time a write was performed, that would slow things down a lot. So the actual writing to disk is buffered by at least the OS, but in many cases by the I/O library being used (e.g., the C standard I/O library). When you ask to write data to a file, it usually just means that the OS has copied your data to its own internal buffer, and will actually put it on the disk when it's convenient to do so.

Flushing overrides this buffering, at whatever level the call is made. So calling h5py.File.flush() will flush the HDF5 library buffers, but not necessarily the OS buffers. The point of this is to give the program some control over when data actually leaves a buffer.

For example, writing to the standard output is usually line-buffered. But if you really want to see the output before a newline, you can call fflush(stdout). This might make sense if you are piping the standard output of one process into another: that downstream process can start consuming the input right away, without waiting for the OS to decide it's a good time.

Another good example is making a call to fork(2). This usually copies the entire address space of a process, which means the I/O buffers as well. That may result in duplicated output, unnecessary copying, etc. Flushing a stream guarantees that the buffer is empty before forking.

191

answered Oct 07 '22 02:10

bnaecker

Related questions
                            
                                Django build video website similar to YouTube
                            
                                Python: How to Remove mouseCallback in OpenCV
                            
                                Scrapy: downloader/response_count vs response_received_count
                            
                                Boxplots with multiple categories with seaborn
                            
                                python (boto3) program to delete old snapshots in aws
                            
                                Limit GPU devices in Tensorflow
                            
                                Submitting Google Cloud ML Engine Jobs from Python Directly
                            
                                How do I specify server options?
                            
                                Getting an error importing Excel file into pandas selecting the usecols parameter
                            
                                Change default backend for matplotlib in Jupyter Ipython
                            
                                pydrive get only folders from list
                            
                                Comparing numpy array with itself by element efficiently
                            
                                Is it possible to open an arbitrary number of items using `with` in python? [duplicate]
                            
                                can't open shape file with GeoPandas
                            
                                Python 3: How to write a __iter__ method for derived class so that it extends on the behaviour of the base class' __iter__ method
                            
                                import_meta_graph fails with Data loss: not an sstable (bad magic number)
                            
                                unexpected behaviour of dictionary membership check
                            
                                Pandas :How to split the tuple data in column and create multiple columns
                            
                                Using tensorflow's Dataset pipeline, how do I *name* the results of a `map` operation?
                            
                                Python - AttributeError: 'NoneType' object has no attribute 'cursor'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In h5py, do I need to call flush() before I close a file?

Tags:

python

h5py

Alex

People also ask

1 Answers

bnaecker

Recent Activity

Donate For Us