from multiprocessing import Pool
with Pool(processes=5) as p:
p.starmap(name_of_function, all_inputs)
I have a piece of code like above that executes a function in parallel. Assuming that all_inputs
has 10,000 elements, I would like to know which one is currently executing e.g. 100 out of 10,000... Is there a way to get that index?
We can check if a process is alive via the multiprocessing. Process. is_alive() method.
One of the useful functions in multiprocessing is cpu_count() . This returns the number of CPUs (computer cores) available on your computer to be used for a parallel program.
Daemonize and scale your Python apps In Unix speak, a Daemon is a long-running background process that can perform virtually anything, from executing requests for services to performing any, usually long-running, arbitrary tasks for day-to-day activities on UNIX systems.
The Pool class in multiprocessing can handle an enormous number of processes. It allows you to run multiple jobs per process (due to its ability to queue the jobs). The memory is allocated only to the executing processes, unlike the Process class, which allocates memory to all the processes.
The worker process within multiprocessing.Pool
is an instance of Process
, it keeps an internal counter to identify itself, you could use this counter along with OS process id:
import os
from multiprocessing import current_process, Pool
def x(a):
p = current_process()
print('process counter:', p._identity[0], 'pid:', os.getpid())
if __name__ == '__main__':
with Pool(2) as p:
r = p.map(x, range(4))
p.join()
yields:
process counter: 1 pid: 29443
process counter: 2 pid: 29444
process counter: 2 pid: 29444
process counter: 1 pid: 29443
IIUC, you can pass in the indexes as well. (Steal the setup from @user1767754) (Please let me know if this is not what you are looking for.)
from multiprocessing import Pool
arr = [1,2,3,4,5]
arr_with_idx = zip(arr, range(len(arr)))
def x(a, idx):
print(idx)
return a*a
with Pool(5) as p:
p.starmap(x, arr_with_idx)
Or more concisely, use enumerate
from multiprocessing import Pool
arr = [1,2,3,4,5]
def x(idx, a): # different here
print(idx)
return a*a
with Pool(5) as p:
p.starmap(x, enumerate(arr))
starmap
will unpack each tuple and you can print out the index part.
You can use the current_process
method from multiprocessing. If this isn't accurate enough, you could even pass the processes a name
using a uuid
from multiprocessing import current_process
def x(a):
print(current_process(), a)
return a*a
with Pool(5) as p:
p.map(x, [1,2,3,4,5]
I'd suggest passing the index along with the other arguments. You could use enumerate
perhaps combined with a generator expression add the value to your existing arguments. Here's code that assumes all_inputs
is an iterable of tuples:
with Pool(processes=5) as p:
p.starmap(name_of_function, ((i,) + args for i, args in enumerate(all_inputs)))
You can choose from a bunch of variations on this general theme. For instance, you could put the index at the end of the arguments, rather than at the start (just swap (i,) + args
to args + (i,)
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With