To fully utilize CPU/GPU I run several processes that do DNN inference (feed forward) on separate datasets. Since the processes allocate CUDA memory during the feed forward I'm getting a CUDA out of memory error. To mitigate this I added torch.cuda.empty_cache()
call which made things better. However, there are still occasional out of memory errors. Probably due to bad allocation/release timing.
I managed to solve the problem by adding a multiprocessing.BoundedSemaphore
around the feed forward call but this introduces difficulties in initializing and sharing the semaphore between the processes.
Is there a better way to avoid this kind of errors while running multiple GPU inference processes?
From my experience of parallel training and inference, it is almost impossible to squeeze the last bit of the GPU memory. Probably the best you can do is to estimate the maximum number of processes that can run in parallel, then restrict your code to run up to that many processes at the same time. Using semaphore is the typical way to restrict the number of parallel processes and automatically start a new process when there is an open slot.
To make it easier to initialize and share semaphore between processes, you can use a multiprocessing.Pool
and the pool initializer as follows.
semaphore = mp.BoundedSemaphore(n_process)
with mp.Pool(n_process, initializer=pool_init, initargs=(semaphore,)) as pool:
# here, each process can access the shared variable pool_semaphore
def pool_init(semaphore):
global pool_semaphore
pool_semaphore = semaphore
On the other hand, the greedy approach is to run with a try ... except
block in a while
loop and keep trying to use GPU. However, this may come with significant performance overhead, so maybe not a good idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With