Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: What to worry about when mutliple processes are concurrently writing to "shelve"?

For my python application I am thinking of using shelve, part of the standard library. There will be hundreds of processes, each writing something to the same shelve object. The writing will always be to add a new key,value pair to the shelve. The keys are unique, so no two processes will update the same entry.

What could go wrong in such a scenario?

like image 806
Anas Elghafari Avatar asked Aug 16 '14 08:08

Anas Elghafari


1 Answers

The shelve documentation is explicit about this.

The shelve module does not support concurrent read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing. Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used.

So, without process synchronisation, I wouldn't do it.

How are the processes started? If they are created by a master process then you can look at the multiprocessing module. Use a Queue to which the child processes write back their results, and have the master remove items from the queue and write them to the shelf. Example of this sort of this is at https://stackoverflow.com/a/24501437/21945.

If you have no process hierarchy then you'll need to use locking to control read and write access to the shelf file. If you are using Linux or similar you might use posix_ipc named semaphore.

The other obvious option is to use a database server - Postgresql or similar.

like image 168
mhawke Avatar answered Sep 28 '22 23:09

mhawke