Serverless concurrent write access in Python

Question

Are there any packages in Python that support concurrent writes on NFS using a serverless architecture?

I work in an environment where I have a supercomputer, and multiple jobs save their data in parallel. While I can save the result of these computations in separate files, and combine their results later, this requires me to write a reader that is aware of the specific way in which I split my computation across jobs, so that it knows how to stitch everything in a final data structure correctly.

Last time I checked SQLite did not support concurrency in NFS. Are there any alternatives to SQLite?

Note: By serverless I mean avoiding to explicitly start another server (on top of NFS) that handles the IO requests. I understand that NFS uses a client-server architecture, but this filesystem is already part of the supercomputer that I use. I do not need to maintain myself. What I am looking for is a package or file format that supports concurrent IO without requiring me to set-up any (additional) servers.

Example:

Here is an example of two jobs that I would run in parallel:

Job 1 populates my_dict from scratch with the following data, and saves it to file :

my_dict{'a'}{'foo'} = [0.2, 0.3, 0.4]
Job 2 also populates my_dict from scratch with the following data, and saves it to file:

my_dict{'a'}{'bar'} = [0.1, 0.2]

I want to later load file, and see the following in my_dict:

> my_dict{'a'}.items()
[('foo', [0.2, 0.3, 0.4]), ('bar', [2, 3, 5])]

Note that the stitching operation is automatic. In this particular case, I chose to split the keys in my_dict['a'] across the computations, but other splits are possible. The fundamental idea is that there are no clashes between jobs. It implicitly assumes that jobs add/aggregate data, so the fusion of dictionaries (dataframes if using Pandas) always results in aggregating the data, i.e. computing an "outer join" of the data.

Dima Tisnek · Accepted Answer

Simple DIY, potentially flaky

Hierarchical locking -- i.e. you lock / first, then lock /foo and unlock /, then lock /foo/bar and unlock /foo. Make changes to /foo/bar and unlock it.

This allows other processes access to other paths. Lock contention on / is relatively small.

Complicated DIY

Adapt a lock-free or wait-free algorithm, e.g. RCU. Pointers become symlinks or files containing lists of other paths.

http://www.rdrop.com/users/paulmck/rclock/intro/rclock_intro.html https://dank.qemfd.net/dankwiki/index.php/Lock-free_algorithms

Serverless concurrent write access in Python

Tags:

python

binaryfiles

hpc

Example:

Josh

1 Answers

Simple DIY, potentially flaky

Complicated DIY

Dima Tisnek

Recent Activity

Donate For Us

Serverless concurrent write access in Python

Tags:

python

binaryfiles

hpc

Example:

Josh

1 Answers

Simple DIY, potentially flaky

Complicated DIY

Dima Tisnek

Related questions

Recent Activity

Donate For Us