Problem
I'm using Python's multiprocessing module to execute functions asynchronously. What I want to do is be able to track the overall progress of my script as each process calls and executes def add_print
. For instance, I would like the code below to add 1 to total
and print out the value (1 2 3 ... 18 19 20
) every time a process runs that function. My first attempt was to use a global variable but this didn't work. Since the function is being called asynchronously, each process reads total
as 0 to start off, and adds 1 independently of other processes. So the output is 20 1
's instead of incrementing values.
How could I go about referencing the same block of memory from my mapped function in a synchronous manner, even though the function is being run asynchronously? One idea I had was to somehow cache total
in memory and then reference that exact block of memory when I add to total
. Is this a possible and fundamentally sound approach in python?
Please let me know if you need anymore info or if I didn't explain something well enough.
Thanks!
Code
#!/usr/bin/python
## Import builtins
from multiprocessing import Pool
total = 0
def add_print(num):
global total
total += 1
print total
if __name__ == "__main__":
nums = range(20)
pool = Pool(processes=20)
pool.map(add_print, nums)
The pool's map method chops the given iterable into a number of chunks which it submits to the process pool as separate tasks. The pool's map is a parallel equivalent of the built-in map method. The map blocks the main execution until all computations finish. The Pool can take the number of processes as a parameter.
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
You could use a shared Value
:
import multiprocessing as mp
def add_print(num):
"""
https://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing
"""
with lock:
total.value += 1
print(total.value)
def setup(t, l):
global total, lock
total = t
lock = l
if __name__ == "__main__":
total = mp.Value('i', 0)
lock = mp.Lock()
nums = range(20)
pool = mp.Pool(initializer=setup, initargs=[total, lock])
pool.map(add_print, nums)
The pool initializer calls setup
once for each worker subprocess. setup
makes total
a global variable in the worker process, so total
can be
accessed inside add_print
when the worker calls add_print
.
Note, the number of processes should not exceed the number of CPUs your machine has. If you do, the excess subprocesses will wait around for a CPUs to become available. So don't use processes=20
unless you have 20 or more CPUs. If you don't supply a processes
argument, multiprocessing
will detect the number of CPUs available and spawn a pool with that many workers for you. The number of tasks (e.g. the length of nums
) usually greatly exceeds the number of CPUs. That's fine; the tasks are queued and processed by one of the workers as a worker becomes available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With