Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributing jobs evenly across multiple GPUs with `multiprocessing.Pool`

Let's say that I have the following:

  • A system with 4 GPUs.
  • A function, foo, which may be run up to 2 times simultaneously on each GPU.
  • A list of files that need to be processed using foo in any order. However, each file takes an unpredictable amount of time to be processed.

I would like to process all the files, keeping all the GPUs as busy as possible by ensuring there are always 8 instances of foo running at any given time (2 instance on each GPU) until less than 8 files remain.

The actual details of invoking the GPU are not my issue. What I'm trying to figure out is how to write the parallelization so that I can keep 8 instances of foo running but somehow making sure that exactly 2 of each GPU ID are used at all times.

I've come up with one way to solve this problem using multiprocessing.Pool, but the solution is quite brittle and relies on (AFAIK) undocumented features. It relies on the fact that the processes within the Pool are named in the format FormPoolWorker-%d where %d is a number between one and the number of processes in the pool. I take this value and mod it with the number of GPUs and that gives me a valid GPU id. However, it would be much nicer if I could somehow give the GPU id directly to each process, perhaps on initialization, instead of relying on the string format of the process names.

One thing I considered is that if the initializer and initargs parameters of Pool.__init__ allowed for a list of initargs so that each process could be initialized with a different set of arguments then the problem would be moot. Unfortunately that doesn't appear to work.

Can anybody recommend a more robust or pythonic solution to this problem?

Hacky solution (Python 3.7):

from multiprocessing import Pool, current_process

def foo(filename):
    # Hacky way to get a GPU id using process name (format "ForkPoolWorker-%d")
    gpu_id = (int(current_process().name.split('-')[-1]) - 1) % 4

    # run processing on GPU <gpu_id>
    ident = current_process().ident
    print('{}: starting process on GPU {}'.format(ident, gpu_id))
    # ... process filename
    print('{}: finished'.format(ident))

pool = Pool(processes=4*2)

files = ['file{}.xyz'.format(x) for x in range(1000)]
for _ in pool.imap_unordered(foo, files):
    pass
pool.close()
pool.join()
like image 475
jodag Avatar asked Nov 22 '18 01:11

jodag


People also ask

When would you use a multiprocessing pool?

Understand multiprocessing in no more than 6 minutes Multiprocessing is quintessential when a long-running process has to be speeded up or multiple processes have to execute parallelly. Executing a process on a single core confines its capability, which could otherwise spread its tentacles across multiple cores.

What is GPU multiprocessing?

Multithreading, a graphical processing unit (GPU) executes multiple threads in parallel, the operating system supports. The threads share a single or multiple cores, including the graphical units, the graphics processor, and RAM.

What is multiprocess synchronization?

Multiprocessor system facilitates parallel program execution and read/write sharing of data and thus may cause the processors to concurrently access location in the shared memory. Therefore, a correct and reliable mechanism is needed to serialize this access.

What is the difference between pool and process in multiprocessing?

As we have seen, the Process allocates all the tasks in memory and Pool allocates only executing processes in memory, so when the task numbers is large, we can use Pool and when the task number is small, we can use Process class.

How does multiprocessing affect GPU memory?

Please try again later. If you run multiprocessing by default configuration, then the first thread allocates all memory and out of memory exception is throwed by the second thread. Using multiprocessing, GPU and allowing GPU memory growth is untouched topic.

How does multiprocessing library process 100 instances?

We feed 10 items into the pool, and multiprocessing library processes these 10 items simultaneously even though there are totally 100 instances. Then, the library will process the remaining 10 items when first pool thread completed.

What is the difference between pool and processes in Java?

However, the Pool class is more convenient, and you do not have to manage it manually. The syntax to create a pool object is multiprocessing.Pool (processes, initializer, initargs, maxtasksperchild, context). All the arguments are optional. processes represent the number of worker processes you want to create.

What happens if a process is not provided by the pool?

If not provided any, the processes will exist as long as the pool does. Consider the following example that calculates the square of the number and sleeps for 1 second. Here, we import the Pool class from the multiprocessing module. In the main function, we create an object of the Pool class.


1 Answers

I figured it out. It's actually quite simple. All we need to do is use a multiprocessing.Queue to manage the available GPU IDs. Start by initializing the Queue to contain 2 of each GPU ID, then get the GPU ID from the queue at the beginning of foo and put it back at the end.

from multiprocessing import Pool, current_process, Queue

NUM_GPUS = 4
PROC_PER_GPU = 2    

queue = Queue()

def foo(filename):
    gpu_id = queue.get()
    try:
        # run processing on GPU <gpu_id>
        ident = current_process().ident
        print('{}: starting process on GPU {}'.format(ident, gpu_id))
        # ... process filename
        print('{}: finished'.format(ident))
    finally:
        queue.put(gpu_id)

# initialize the queue with the GPU ids
for gpu_ids in range(NUM_GPUS):
    for _ in range(PROC_PER_GPU):
        queue.put(gpu_ids)

pool = Pool(processes=PROC_PER_GPU * NUM_GPUS)
files = ['file{}.xyz'.format(x) for x in range(1000)]
for _ in pool.imap_unordered(foo, files):
    pass
pool.close()
pool.join()
like image 120
jodag Avatar answered Sep 28 '22 01:09

jodag