Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to have multiple python programs append rows to the same file?

I've got multiple python processes (typically 1 per core) transforming large volumes of data that they are each reading from dedicated sources, and writing to a single output file that each opened in append mode.

Is this a safe way for these programs to work?

Because of the tight performance requirements and large data volumes I don't think that I can have each process repeatedly open & close the file. Another option is to have each write to a dedicated output file and a single process concatenate them together once they're all done. But I'd prefer to avoid that.

Thanks in advance for any & all answers and suggestions.

like image 371
KenFar Avatar asked Feb 28 '23 10:02

KenFar


1 Answers

Have you considered using the multiprocessing module to coordinate between the running programs in a thread-like manner? See in particular the queue interface; you can place each completed work item on a queue when completed, and have a single process reading off the queue and writing to your output file.

Alternately, you can have each subprocess maintain a separate pipe to a parent process which does a select() call from all of them, and copies data to the output file when appropriate. Of course, this can be done "by hand" (without the multiprocessing module) as well as with it.

Alternately, if the reason you're avoiding threads is to avoid the global interpreter lock, you might consider a non-CPython implementation (such as Jython or IronPython).

like image 83
Charles Duffy Avatar answered Mar 02 '23 00:03

Charles Duffy