concurrent writing to the same file using threads and processes

Question

what is the correct solution to be sure that file will never be corrupted while using many threads and processes.

version for threads, which care about opening errors.

lock = threading.RLock()
with lock:
   try:
     f = open(file, 'a')
     try:
        f.write('sth')
     finally:
        f.close() # try close in any circumstances if open passed
   except:
     pass # when open failed

for processes I guess must use multiprocessing.Lock

but if I want 2 processes, and the first process own 2 threads (each one use file)

there is just theory, but I want know how to mix synchronization with threads and processes. are threads "inherit" it from process?, so only synchonization between processes are required ?

and 2. I'm not sure if above code need nested try in case when write will fail, and we want close opened file (what if it will remain opened after lock released)

abarnert · Accepted Answer

While this isn't entirely clear from the docs, multiprocessing synchronization primitives do in fact synchronize threads as well.

For example, if you run this code:

import multiprocessing
import sys
import threading
import time

lock = multiprocessing.Lock()

def f(i):
    with lock:
        for _ in range(10):
            sys.stderr.write(i)
            time.sleep(1)

t1 = threading.Thread(target=f, args=['1'])
t2 = threading.Thread(target=f, args=['2'])
t1.start()
t2.start()
t1.join()
t2.join()

… the output will always be 1111111111222222222 or 22222222221111111111, not a mixture of the two.

The locks are implemented on top of Win32 kernel sync objects on Windows, semaphores on POSIX platforms that support them, and not implemented at all on other platforms. (You can test this with import multiprocessing.semaphore, which will raise an ImportError on other platforms, as explained in the docs.)

That being said, it's certainly safe to have two levels of locks, as long as you always use them in the right order—that is, never grab the threading.Lock unless you can guarantee that your process has the multiprocessing.Lock.

If you do this cleverly enough, it can have performance benefits. (Cross-process locks on Windows, and on some POSIX platforms, can be orders of magnitude slower than intra-process locks.)

If you just do it in the obvious way (only do with threadlock: inside with processlock: blocks), it obviously won't help performance, and in fact will slow things down a bit (although quite possibly not enough to measure), and it won't add any direct benefits. Of course your readers will know that your code is correct even if they don't know that multiprocessing locks work between threads, and in some cases debugging intraprocess deadlocks can be a lot easier than debugging interprocess deadlocks… but I don't think either of those is a good enough reason for the extra complexity in most cases.

concurrent writing to the same file using threads and processes

Tags:

python

synchronization

multithreading

multiprocessing

Sławomir Lenart

1 Answers

abarnert

Recent Activity

Donate For Us

concurrent writing to the same file using threads and processes

Tags:

python

synchronization

multithreading

multiprocessing

Sławomir Lenart

1 Answers

abarnert

Related questions

Recent Activity

Donate For Us