Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting index of currently executing input in python multiprocessing

    from multiprocessing import Pool
    with Pool(processes=5) as p:
        p.starmap(name_of_function, all_inputs)

I have a piece of code like above that executes a function in parallel. Assuming that all_inputs has 10,000 elements, I would like to know which one is currently executing e.g. 100 out of 10,000... Is there a way to get that index?

like image 226
user308827 Avatar asked Jan 11 '18 23:01

user308827


People also ask

How do I check if a process is running in Python multiprocessing?

We can check if a process is alive via the multiprocessing. Process. is_alive() method.

What is multiprocessing Cpu_count ()?

One of the useful functions in multiprocessing is cpu_count() . This returns the number of CPUs (computer cores) available on your computer to be used for a parallel program.

What is a Daemonic process Python?

Daemonize and scale your Python apps In Unix speak, a Daemon is a long-running background process that can perform virtually anything, from executing requests for services to performing any, usually long-running, arbitrary tasks for day-to-day activities on UNIX systems.

What is pool in multiprocessing Python?

The Pool class in multiprocessing can handle an enormous number of processes. It allows you to run multiple jobs per process (due to its ability to queue the jobs). The memory is allocated only to the executing processes, unlike the Process class, which allocates memory to all the processes.


4 Answers

The worker process within multiprocessing.Pool is an instance of Process, it keeps an internal counter to identify itself, you could use this counter along with OS process id:

import os
from multiprocessing import current_process, Pool


def x(a):
    p = current_process()
    print('process counter:', p._identity[0], 'pid:', os.getpid())


if __name__ == '__main__':
    with Pool(2) as p:
        r = p.map(x, range(4))
    p.join()

yields:

process counter: 1 pid: 29443
process counter: 2 pid: 29444
process counter: 2 pid: 29444
process counter: 1 pid: 29443
like image 149
georgexsh Avatar answered Oct 19 '22 11:10

georgexsh


IIUC, you can pass in the indexes as well. (Steal the setup from @user1767754) (Please let me know if this is not what you are looking for.)

from multiprocessing import Pool

arr = [1,2,3,4,5]
arr_with_idx = zip(arr, range(len(arr)))

def x(a, idx):
    print(idx)
    return a*a

with Pool(5) as p:
    p.starmap(x, arr_with_idx)

Or more concisely, use enumerate

from multiprocessing import Pool

arr = [1,2,3,4,5]

def x(idx, a):  # different here
    print(idx)
    return a*a

with Pool(5) as p:
    p.starmap(x, enumerate(arr))

starmap will unpack each tuple and you can print out the index part.

like image 44
Tai Avatar answered Oct 19 '22 10:10

Tai


You can use the current_process method from multiprocessing. If this isn't accurate enough, you could even pass the processes a name using a uuid

from multiprocessing import current_process


def x(a):
    print(current_process(), a)
    return a*a

with Pool(5) as p:
    p.map(x, [1,2,3,4,5]
like image 5
user1767754 Avatar answered Oct 19 '22 11:10

user1767754


I'd suggest passing the index along with the other arguments. You could use enumerate perhaps combined with a generator expression add the value to your existing arguments. Here's code that assumes all_inputs is an iterable of tuples:

with Pool(processes=5) as p:
    p.starmap(name_of_function, ((i,) + args for i, args in enumerate(all_inputs)))

You can choose from a bunch of variations on this general theme. For instance, you could put the index at the end of the arguments, rather than at the start (just swap (i,) + args to args + (i,)).

like image 2
Blckknght Avatar answered Oct 19 '22 10:10

Blckknght