Is there a way to assign each worker in a python multiprocessing pool a unique ID in a way that a job being run by a particular worker in the pool could know which worker is running it? According to the docs, a Process
has a name
but
The name is a string used for identification purposes only. It has no semantics. Multiple processes may be given the same name.
For my particular use-case, I want to run a bunch of jobs on a group of four GPUs, and need to set the device number for the GPU that the job should run on. Because the jobs are of non-uniform length, I want to be sure that I don't have a collision on a GPU of a job trying to run on it before the previous one completes (so this precludes pre-assigning an ID to the unit of work ahead of time).
current_process() function. Once we have the process instance, we get the pid via the multiprocessing. Process. pid attribute.
dummy module module provides a wrapper for the multiprocessing module, except implemented using thread-based concurrency. It provides a drop-in replacement for multiprocessing, allowing a program that uses the multiprocessing API to switch to threads with a single change to import statements.
In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.
So, multiprocessing is faster when the program is CPU-bound. In cases where there is a lot of I/O in your program, threading may be more efficient because most of the time, your program is waiting for the I/O to complete. However, multiprocessing is generally more efficient because it runs concurrently.
It seems like what you want is simple: multiprocessing.current_process()
. For example:
import multiprocessing def f(x): print multiprocessing.current_process() return x * x p = multiprocessing.Pool() print p.map(f, range(6))
Output:
$ python foo.py <Process(PoolWorker-1, started daemon)> <Process(PoolWorker-2, started daemon)> <Process(PoolWorker-3, started daemon)> <Process(PoolWorker-1, started daemon)> <Process(PoolWorker-2, started daemon)> <Process(PoolWorker-4, started daemon)> [0, 1, 4, 9, 16, 25]
This returns the process object itself, so the process can be its own identity. You could also call id
on it for a unique numerical id -- in cpython, this is the memory address of the process object, so I don't think there's any possibility of overlap. Finally, you can use the ident
or the pid
property of the process -- but that's only set once the process is started.
Furthermore, looking over the source, it seems to me very likely that autogenerated names (as exemplified by the first value in the Process
repr strings above) are unique. multiprocessing
maintains an itertools.counter
object for every process, which is used to generate an _identity
tuple for any child processes it spawns. So the top-level process produces child process with single-value ids, and they spawn process with two-value ids, and so on. Then, if no name is passed to the Process
constructor, it simply autogenerates the name based on the _identity, using ':'.join(...)
. Then Pool
alters the name of the process using replace
, leaving the autogenerated id the same.
The upshot of all this is that although two Process
es may have the same name, because you may assign the same name to them when you create them, they are unique if you don't touch the name parameter. Also, you could theoretically use _identity
as a unique identifier; but I gather they made that variable private for a reason!
An example of the above in action:
import multiprocessing def f(x): created = multiprocessing.Process() current = multiprocessing.current_process() print 'running:', current.name, current._identity print 'created:', created.name, created._identity return x * x p = multiprocessing.Pool() print p.map(f, range(6))
Output:
$ python foo.py running: PoolWorker-1 (1,) created: Process-1:1 (1, 1) running: PoolWorker-2 (2,) created: Process-2:1 (2, 1) running: PoolWorker-3 (3,) created: Process-3:1 (3, 1) running: PoolWorker-1 (1,) created: Process-1:2 (1, 2) running: PoolWorker-2 (2,) created: Process-2:2 (2, 2) running: PoolWorker-4 (4,) created: Process-4:1 (4, 1) [0, 1, 4, 9, 16, 25]
You can use multiprocessing.Queue
to store the ids and then get the id at initialization of the pool process.
Advantages:
queue.get()
and will not perform any work (This won't block your porgram, or at least it did not when I tested).Disadvantages:
sleep(1)
in the example all work might be performed by the first process, as others are not done initializing, yet.Example:
import multiprocessing from time import sleep def init(queue): global idx idx = queue.get() def f(x): global idx process = multiprocessing.current_process() sleep(1) return (idx, process.pid, x * x) ids = [0, 1, 2, 3] manager = multiprocessing.Manager() idQueue = manager.Queue() for i in ids: idQueue.put(i) p = multiprocessing.Pool(8, init, (idQueue,)) print(p.map(f, range(8)))
Output:
[(0, 8289, 0), (1, 8290, 1), (2, 8294, 4), (3, 8291, 9), (0, 8289, 16), (1, 8290, 25), (2, 8294, 36), (3, 8291, 49)]
Note, that there are only 4 different pid, although the pool contains 8 processes and one idx is only used by one process.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With