Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Pytorch inference CUDA out of memory when multiprocessing


To fully utilize CPU/GPU I run several processes that do DNN inference (feed forward) on separate datasets. Since the processes allocate CUDA memory during the feed forward I'm getting a CUDA out of memory error. To mitigate this I added torch.cuda.empty_cache() call which made things better. However, there are still occasional out of memory errors. Probably due to bad allocation/release timing.

I managed to solve the problem by adding a multiprocessing.BoundedSemaphore around the feed forward call but this introduces difficulties in initializing and sharing the semaphore between the processes.

Is there a better way to avoid this kind of errors while running multiple GPU inference processes?

like image 444
Xyand Avatar asked Aug 23 '20 17:08


1 Answers

From my experience of parallel training and inference, it is almost impossible to squeeze the last bit of the GPU memory. Probably the best you can do is to estimate the maximum number of processes that can run in parallel, then restrict your code to run up to that many processes at the same time. Using semaphore is the typical way to restrict the number of parallel processes and automatically start a new process when there is an open slot.

To make it easier to initialize and share semaphore between processes, you can use a multiprocessing.Pool and the pool initializer as follows.

semaphore = mp.BoundedSemaphore(n_process)
with mp.Pool(n_process, initializer=pool_init, initargs=(semaphore,)) as pool:
    # here, each process can access the shared variable pool_semaphore

def pool_init(semaphore):
    global pool_semaphore
    pool_semaphore = semaphore

On the other hand, the greedy approach is to run with a try ... except block in a while loop and keep trying to use GPU. However, this may come with significant performance overhead, so maybe not a good idea.

like image 136
THN Avatar answered Oct 02 '22 16:10