I am trying to share a lock among processes. I understand that the way to share a lock is to pass it as an argument to the target function. However I found that even the approach below is working. I could not understand the way the processes are sharing this lock. Could anyone please explain?
import multiprocessing as mp
import time
class SampleClass:
def __init__(self):
self.lock = mp.Lock()
self.jobs = []
self.total_jobs = 10
def test_run(self):
for i in range(self.total_jobs):
p = mp.Process(target=self.run_job, args=(i,))
p.start()
self.jobs.append(p)
for p in self.jobs:
p.join()
def run_job(self, i):
with self.lock:
print('Sleeping in process {}'.format(i))
time.sleep(5)
if __name__ == '__main__':
t = SampleClass()
t.test_run()
On Windows (which you said you're using), these kinds of things always reduce to details about how multiprocessing
plays with pickle
, because all Python data crossing process boundaries on Windows is implemented by pickling on the sending end (and unpickling on the receiving end).
My best advice is to avoid doing things that raise such questions to begin with ;-) For example, the code you showed blows up on Windows under Python 2, and also blows up under Python 3 if you use a multiprocessing.Pool
method instead of multiprocessing.Process
.
It's not just the lock, simply trying to pickle a bound method (like self.run_job
) blows up in Python 2. Think about it. You're crossing a process boundary, and there isn't an object corresponding to self
on the receiving end. To what object is self.run_job
supposed to be bound on the receiving end?
In Python 3, pickling self.run_job
also pickles a copy of the self
object. So that's the answer: a SampleClass
object corresponding to self
is created by magic on the receiving end. Clear as mud. t
's entire state is pickled, including t.lock
. That's why it "works".
See this for more implementation details:
Why can I pass an instance method to multiprocessing.Process, but not a multiprocessing.Pool?
In the long run, you'll suffer the fewest mysteries if you stick to things that were obviously intended to work: pass module-global callable objects (neither, e.g., instance methods nor local functions), and explicitly pass multiprocessing
data objects (whether an instance of Lock
, Queue
, manager.list
, etc etc).
On Unix Operating Systems, new processes are created via the fork
primitive.
The fork
primitive works by cloning the parent process memory address space assigning it to the child. The child will have a copy of the parent's memory as well as for the file descriptors and shared objects.
This means that, when you call fork, if the parent has a file opened, the child will have it too. The same applied with shared objects such as pipes, sockets etc...
In Unix+CPython, Locks
are realized via the sem_open
primitive which is designed to be shared when forking a process.
I usually recommend against mixing concurrency (multiprocessing in particular) and OOP because it frequently leads to these kind of misunderstandings.
EDIT:
Saw just now that you are using Windows. Tim Peters gave the right answer. For the sake of abstraction, Python is trying to provide OS independent behaviour over its API. When calling an instance method, it will pickle the object and send it over a pipe. Thus providing a similar behaviour as for Unix.
I'd recommend you to read the programming guidelines for multiprocessing. Your issue is addressed in particular in the first point:
Avoid shared state
As far as possible one should try to avoid shifting large amounts of data between processes.
It is probably best to stick to using queues or pipes for communication between processes rather than using the lower level synchronization primitives.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With