Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing Queue put() behavior

I'm doing something very simple using multiprocessing:

data = {'a': 1}
queue.put(data, True)
data.clear()

When I use the queue on another process (using get() method), I get an empty dictionary. If I remove data.clear() I get the keys as expected. Is there any way to wait for the put() to have finished the serialization ?

like image 333
Tarantula Avatar asked Feb 18 '15 20:02

Tarantula


People also ask

What does multiprocessing queue () do?

Python Multiprocessing modules provides Queue class that is exactly a First-In-First-Out data structure. They can store any pickle Python object (though simple ones are best) and are extremely useful for sharing data between processes.

How do you pass multiple arguments in multiprocessing Python?

Use Pool. The multiprocessing pool starmap() function will call the target function with multiple arguments. As such it can be used instead of the map() function. This is probably the preferred approach for executing a target function in the multiprocessing pool that takes multiple arguments.

Is Python multiprocessing queue thread safe?

Yes, it is. From https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes: Queues are thread and process safe.

How does Python multiprocess work?

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.


2 Answers

Actually, this is thought to be a feature, not a problem. The queue immediately returns so your process continues while serialization happens and to avoid what is known as "queue contention".

The two options I suggest you have:

  1. Are you absolutely sure you need mutable dictionaries in the first place? Instead of making defensive copies of your data, which you correctly seem to dislike, why not just create a new dictionary instead of using dict.clear() and let the garbage collector worry about old dictionaries?

  2. Pickle the data yourself; That is: a_queue.put(pickle.dumps(data)) and pickle.loads(a_queue.get()). Now, if you do data.clear() just after a put, the data has already been serialized "by you".

From a parallel programming point of view the first approach (treat your data as if it were immutable) is the more viable and clean thing to do on the long term, but I am not sure if or why you must clear your dictionaries.

like image 71
fnl Avatar answered Oct 24 '22 23:10

fnl


The best way is probably to make a copy of data before sending it. Try:

data = {'a': 1}
dc = data.copy()
queue.put(dc)
data.clear()

Basically, you can't count on the send finishing before the dictionary is cleared, so you shouldn't try. dc will be garbage-collected when it goes out of scope or when the code is executed again.

like image 33
Tom Hunt Avatar answered Oct 25 '22 00:10

Tom Hunt