Is there a way to assign each worker in a python multiprocessing pool a unique ID in a way that a job being run by a particular worker in the pool could know which worker is running it? According to the docs, a <code>Process</code> has a <code>name</code> but <blockquote> The name is a string used for identification purposes only. It has no semantics. Multiple processes may be given the same name. </blockquote> For my particular use-case, I want to run a bunch of jobs on a group of four GPUs, and need to set the device number for the GPU that the job should run on. Because the jobs are of non-uniform length, I want to be sure that I don't have a collision on a GPU of a job trying to run on it before the previous one completes (so this precludes pre-assigning an ID to the unit of work ahead of time).

You can use <code>multiprocessing.Queue</code> to store the ids and then get the id at initialization of the pool process. Advantages: <ul> <li>You do not need to rely on internals.</li> <li>If your use case is to manage resources/ devices then you can put in the device number directly. This will also ensure that no device is used twice: If you have more processes in your pool than devices, the additional processes will block on <code>queue.get()</code> and will not perform any work (This won't block your porgram, or at least it did not when I tested).</li> </ul> Disadvantages: <ul> <li>You have additional communication overhead and spawning the pool processes takes a tiny bit longer: Without the <code>sleep(1)</code> in the example all work might be performed by the first process, as others are not done initializing, yet.</li> <li>You need a global (or at least I don't know a way around it)</li> </ul> Example: <pre class="prettyprint"><code>import multiprocessing from time import sleep def init(queue): global idx idx = queue.get() def f(x): global idx process = multiprocessing.current_process() sleep(1) return (idx, process.pid, x * x) ids = [0, 1, 2, 3] manager = multiprocessing.Manager() idQueue = manager.Queue() for i in ids: idQueue.put(i) p = multiprocessing.Pool(8, init, (idQueue,)) print(p.map(f, range(8))) </code></pre> Output: <pre class="prettyprint"><code>[(0, 8289, 0), (1, 8290, 1), (2, 8294, 4), (3, 8291, 9), (0, 8289, 16), (1, 8290, 25), (2, 8294, 36), (3, 8291, 49)] </code></pre> Note, that there are only 4 different pid, although the pool contains 8 processes and one idx is only used by one process.

Get a unique ID for worker in python multiprocessing pool

Tags:

python

multiprocessing

Is there a way to assign each worker in a python multiprocessing pool a unique ID in a way that a job being run by a particular worker in the pool could know which worker is running it? According to the docs, a Process has a name but

The name is a string used for identification purposes only. It has no semantics. Multiple processes may be given the same name.

For my particular use-case, I want to run a bunch of jobs on a group of four GPUs, and need to set the device number for the GPU that the job should run on. Because the jobs are of non-uniform length, I want to be sure that I don't have a collision on a GPU of a job trying to run on it before the previous one completes (so this precludes pre-assigning an ID to the unit of work ahead of time).

579

asked Apr 17 '12 12:04

JoshAdel

2 Answers

It seems like what you want is simple: multiprocessing.current_process(). For example:

import multiprocessing  def f(x):     print multiprocessing.current_process()     return x * x  p = multiprocessing.Pool() print p.map(f, range(6))

Output:

$ python foo.py  <Process(PoolWorker-1, started daemon)> <Process(PoolWorker-2, started daemon)> <Process(PoolWorker-3, started daemon)> <Process(PoolWorker-1, started daemon)> <Process(PoolWorker-2, started daemon)> <Process(PoolWorker-4, started daemon)> [0, 1, 4, 9, 16, 25]

This returns the process object itself, so the process can be its own identity. You could also call id on it for a unique numerical id -- in cpython, this is the memory address of the process object, so I don't think there's any possibility of overlap. Finally, you can use the ident or the pid property of the process -- but that's only set once the process is started.

Furthermore, looking over the source, it seems to me very likely that autogenerated names (as exemplified by the first value in the Process repr strings above) are unique. multiprocessing maintains an itertools.counter object for every process, which is used to generate an _identity tuple for any child processes it spawns. So the top-level process produces child process with single-value ids, and they spawn process with two-value ids, and so on. Then, if no name is passed to the Process constructor, it simply autogenerates the name based on the _identity, using ':'.join(...). Then Pool alters the name of the process using replace, leaving the autogenerated id the same.

The upshot of all this is that although two Processes may have the same name, because you may assign the same name to them when you create them, they are unique if you don't touch the name parameter. Also, you could theoretically use _identity as a unique identifier; but I gather they made that variable private for a reason!

An example of the above in action:

import multiprocessing  def f(x):     created = multiprocessing.Process()     current = multiprocessing.current_process()     print 'running:', current.name, current._identity     print 'created:', created.name, created._identity     return x * x  p = multiprocessing.Pool() print p.map(f, range(6))

Output:

$ python foo.py  running: PoolWorker-1 (1,) created: Process-1:1 (1, 1) running: PoolWorker-2 (2,) created: Process-2:1 (2, 1) running: PoolWorker-3 (3,) created: Process-3:1 (3, 1) running: PoolWorker-1 (1,) created: Process-1:2 (1, 2) running: PoolWorker-2 (2,) created: Process-2:2 (2, 2) running: PoolWorker-4 (4,) created: Process-4:1 (4, 1) [0, 1, 4, 9, 16, 25]

answered Sep 17 '22 06:09

senderle

You can use multiprocessing.Queue to store the ids and then get the id at initialization of the pool process.

Advantages:

You do not need to rely on internals.
If your use case is to manage resources/ devices then you can put in the device number directly. This will also ensure that no device is used twice: If you have more processes in your pool than devices, the additional processes will block on queue.get() and will not perform any work (This won't block your porgram, or at least it did not when I tested).

Disadvantages:

You have additional communication overhead and spawning the pool processes takes a tiny bit longer: Without the sleep(1) in the example all work might be performed by the first process, as others are not done initializing, yet.
You need a global (or at least I don't know a way around it)

Example:

import multiprocessing from time import sleep  def init(queue):     global idx     idx = queue.get()  def f(x):     global idx     process = multiprocessing.current_process()     sleep(1)     return (idx, process.pid, x * x)  ids = [0, 1, 2, 3] manager = multiprocessing.Manager() idQueue = manager.Queue()  for i in ids:     idQueue.put(i)  p = multiprocessing.Pool(8, init, (idQueue,)) print(p.map(f, range(8)))

Output:

[(0, 8289, 0), (1, 8290, 1), (2, 8294, 4), (3, 8291, 9), (0, 8289, 16), (1, 8290, 25), (2, 8294, 36), (3, 8291, 49)]

Note, that there are only 4 different pid, although the pool contains 8 processes and one idx is only used by one process.

answered Sep 17 '22 06:09

Steohan

Related questions
                            
                                What is the difference between TypeVar and NewType?
                            
                                Python Array Slice With Comma?
                            
                                Trying to get PyCharm to work, keep getting "No Python interpreter selected"
                            
                                Convert Pandas dataframe to PyTorch tensor?
                            
                                Equivalent function for xticks for an AxesSubplot object
                            
                                numpy subtract every row of matrix by vector
                            
                                From stat().st_mtime to datetime?
                            
                                Run a .bat file using python code
                            
                                What is the difference between the widgets of tkinter and tkinter.ttk in Python?
                            
                                Tracking *maximum* memory usage by a Python function
                            
                                How do I log an exception at warning- or info-level with traceback using the python logging framework?
                            
                                How to continue in nested loops in Python
                            
                                Passing Numpy arrays to a C function for input and output
                            
                                Easy way to test if each element in an numpy array lies between two values?
                            
                                Convert timestamps with offset to datetime obj using strptime
                            
                                Do you use the "global" statement in Python? [closed]
                            
                                What do I do when I need a self referential dictionary?
                            
                                Comparing boolean and int using isinstance
                            
                                What is the difference between pipeline and make_pipeline in scikit?
                            
                                How to make savefig() save image for 'maximized' window instead of default size

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Get a unique ID for worker in python multiprocessing pool

Tags:

python

multiprocessing

JoshAdel

People also ask

2 Answers

senderle

Steohan

Recent Activity

Donate For Us