When opening an HDF5 file with <code>h5py</code> you can pass in a python file-like object. I have done so, where the file-like object is a custom implementation of my own network-based transport layer. This works great, I can slice large HDF5 files over a high latency transport layer. However HDF5 appears to provide its own file locking functionality, so that if you open multiple files for read-only within the same process (threading model) it will still only run the operations, effectively, in series. There are drivers in HDF5 that support parallel operations, such as <code>h5py.File(f, driver='mpio')</code>, but this doesn't appear to apply to python file-like objects which use <code>h5py.File(f, driver='fileobj')</code>. The only solution I see is to use multiprocessing. However the scalability is very limited, you can only realistically open 10's of processes because of overhead. My transport layer uses asyncio and is capable of parallel operations on the scale of 1,000's or 10,000's, allowing me to build a longer queue of slow file-read operations which boost my total throughput. I can achieve 1.5 GB/sec of large-file, random-seek, binary reads with my transport layer against a local S3 interface when I queue 10k IO ops in parallel (requiring 50GB of RAM to service the requests, an acceptable trade-off for the throughput). <blockquote> Is there any way I can disable the h5py file locking when using <code>driver='fileobj'</code>? </blockquote>

You just need to set the value to FALSE for the environment variable HDF5_USE_FILE_LOCKING. Examples are as follows: In Linux or MacOS via Terminal: <code>export HDF5_USE_FILE_LOCKING=FALSE</code> In Windows via Command Prompts (CMD): <code>set HDF5_USE_FILE_LOCKING=FALSE</code>

Can we disable h5py file locking for python file-like object?

Tags:

python

hdf5

python-3.6

h5py

When opening an HDF5 file with h5py you can pass in a python file-like object. I have done so, where the file-like object is a custom implementation of my own network-based transport layer.

This works great, I can slice large HDF5 files over a high latency transport layer. However HDF5 appears to provide its own file locking functionality, so that if you open multiple files for read-only within the same process (threading model) it will still only run the operations, effectively, in series.

There are drivers in HDF5 that support parallel operations, such as h5py.File(f, driver='mpio'), but this doesn't appear to apply to python file-like objects which use h5py.File(f, driver='fileobj').

The only solution I see is to use multiprocessing. However the scalability is very limited, you can only realistically open 10's of processes because of overhead. My transport layer uses asyncio and is capable of parallel operations on the scale of 1,000's or 10,000's, allowing me to build a longer queue of slow file-read operations which boost my total throughput.

I can achieve 1.5 GB/sec of large-file, random-seek, binary reads with my transport layer against a local S3 interface when I queue 10k IO ops in parallel (requiring 50GB of RAM to service the requests, an acceptable trade-off for the throughput).

Is there any way I can disable the h5py file locking when using driver='fileobj'?

895

asked Aug 01 '19 13:08

David Parks

1 Answers

You just need to set the value to FALSE for the environment variable HDF5_USE_FILE_LOCKING.

Examples are as follows:

In Linux or MacOS via Terminal: export HDF5_USE_FILE_LOCKING=FALSE

In Windows via Command Prompts (CMD): set HDF5_USE_FILE_LOCKING=FALSE

111

answered Oct 16 '22 13:10

Abdullah Khawer

Related questions
                            
                                How to make Pycharm run all python unit tests recursively from tests folder
                            
                                I can't import tensorflow-gpu
                            
                                Find the current line number of a running python process
                            
                                Airflow : ExternalTaskSensor doesn't trigger the task
                            
                                Python sum list of dicts by key with nested dicts
                            
                                Efficiently aggregate a resampled collection of datetimes in pandas
                            
                                Loading hdf5 files into python xarrays
                            
                                How can i use tensorflow object detection to only detect persons?
                            
                                Why is cross_val_predict not appropriate for measuring the generalisation error?
                            
                                Does Buildout support value substitution in the extends option?
                            
                                Storing RTSP stream as video file with OpenCV VideoWriter
                            
                                How to configure Python to ignore the hostname verification?
                            
                                Run command from one container to another
                            
                                How to join data from multiple netCDF files with xarray in Python?
                            
                                How to add tqdm to show progress bar when downloading you tube video with pytube?
                            
                                Commands with multiple common options going into one argument using custom decorator
                            
                                Is there a way to set transparency/alpha level in a seaborn pointplot?
                            
                                How to run an Asyncio task without awaiting?
                            
                                scikit-learn feature ranking returns identical values
                            
                                How to fix <Response 500> error in python requests?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With