Are there any packages in Python that support concurrent writes on NFS using a serverless architecture?
I work in an environment where I have a supercomputer, and multiple jobs save their data in parallel. While I can save the result of these computations in separate files, and combine their results later, this requires me to write a reader that is aware of the specific way in which I split my computation across jobs, so that it knows how to stitch everything in a final data structure correctly.
Last time I checked SQLite did not support concurrency in NFS. Are there any alternatives to SQLite?
Note: By serverless I mean avoiding to explicitly start another server (on top of NFS) that handles the IO requests. I understand that NFS uses a client-server architecture, but this filesystem is already part of the supercomputer that I use. I do not need to maintain myself. What I am looking for is a package or file format that supports concurrent IO without requiring me to set-up any (additional) servers.
Here is an example of two jobs that I would run in parallel:
Job 1 populates my_dict
from scratch with the following data, and saves it to file
:
my_dict{'a'}{'foo'} = [0.2, 0.3, 0.4]
Job 2 also populates my_dict
from scratch with the following data, and saves it to file
:
my_dict{'a'}{'bar'} = [0.1, 0.2]
I want to later load file
, and see the following in my_dict
:
> my_dict{'a'}.items()
[('foo', [0.2, 0.3, 0.4]), ('bar', [2, 3, 5])]
Note that the stitching operation is automatic. In this particular case, I chose to split the keys in my_dict['a']
across the computations, but other splits are possible. The fundamental idea is that there are no clashes between jobs. It implicitly assumes that jobs add/aggregate data, so the fusion of dictionaries (dataframes if using Pandas) always results in aggregating the data, i.e. computing an "outer join" of the data.
Hierarchical locking -- i.e. you lock /
first, then lock /foo
and unlock /
, then lock /foo/bar
and unlock /foo
. Make changes to /foo/bar
and unlock it.
This allows other processes access to other paths. Lock contention on /
is relatively small.
Adapt a lock-free or wait-free algorithm, e.g. RCU. Pointers become symlinks or files containing lists of other paths.
http://www.rdrop.com/users/paulmck/rclock/intro/rclock_intro.html https://dank.qemfd.net/dankwiki/index.php/Lock-free_algorithms
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With