Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can we disable h5py file locking for python file-like object?

When opening an HDF5 file with h5py you can pass in a python file-like object. I have done so, where the file-like object is a custom implementation of my own network-based transport layer.

This works great, I can slice large HDF5 files over a high latency transport layer. However HDF5 appears to provide its own file locking functionality, so that if you open multiple files for read-only within the same process (threading model) it will still only run the operations, effectively, in series.

There are drivers in HDF5 that support parallel operations, such as h5py.File(f, driver='mpio'), but this doesn't appear to apply to python file-like objects which use h5py.File(f, driver='fileobj').

The only solution I see is to use multiprocessing. However the scalability is very limited, you can only realistically open 10's of processes because of overhead. My transport layer uses asyncio and is capable of parallel operations on the scale of 1,000's or 10,000's, allowing me to build a longer queue of slow file-read operations which boost my total throughput.

I can achieve 1.5 GB/sec of large-file, random-seek, binary reads with my transport layer against a local S3 interface when I queue 10k IO ops in parallel (requiring 50GB of RAM to service the requests, an acceptable trade-off for the throughput).

Is there any way I can disable the h5py file locking when using driver='fileobj'?

like image 895
David Parks Avatar asked Aug 01 '19 13:08

David Parks


People also ask

What is HDF5 file locking?

Recent versions of NetCDF and HDF5 (HDF5 1.10. x and newer) use a file locking feature. This prevents data corruption in rare cases of single-writer/multiple-reader and multiple writer access patterns.

Does Python lock file?

This package contains a single module, which implements a platform independent file lock in Python, which provides a simple way of inter-process communication: from filelock import Timeout, FileLock lock = FileLock("high_ground. txt. lock") with lock: with open("high_ground.

What is the use of h5py in Python?

The h5py package is a Pythonic interface to the HDF5 binary data format. HDF5 lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays.

How do I open an HDF5 file in Python?

To use HDF5, numpy needs to be imported. One important feature is that it can attach metaset to every data in the file thus provides powerful searching and accessing. Let's get started with installing HDF5 to the computer. As HDF5 works on numpy, we would need numpy installed in our machine too.


1 Answers

You just need to set the value to FALSE for the environment variable HDF5_USE_FILE_LOCKING.

Examples are as follows:

In Linux or MacOS via Terminal: export HDF5_USE_FILE_LOCKING=FALSE

In Windows via Command Prompts (CMD): set HDF5_USE_FILE_LOCKING=FALSE

like image 111
Abdullah Khawer Avatar answered Oct 16 '22 13:10

Abdullah Khawer