Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to free GPU from CUDA (using Pytorch)?

I'm using spark/face-alignment to generate faces that are almost the same.

 fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=False) # try to use GPU with Pytorch depenencies.
 imageVector.append( convertImagefa(image, fa))
 del fa
 gc.collect()
 torch.cuda.empty_cache() # trying to clean up cuda.
 return imageVector

I'm on a 1 machine with 4 threads that all try to access the GPU. As such I have worked out a strategy that every 4th request it uses the GPU. This seems to fit in memory.

My issue is that when I clean up after cuda it never actually fully cleans. I'll see the load move around the threads and some space free up but CUDA never lets go of the last 624MiB. Is there a way to clean it all the way up?

nvidia-smi                                                                                                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     17132  C   .../face-the-same/bin/python      624MiB |
|    0   N/A  N/A     17260  C   .../face-the-same/bin/python     1028MiB |
|    0   N/A  N/A     17263  C   .../face-the-same/bin/python      624MiB |
|    0   N/A  N/A     17264  C   .../face-the-same/bin/python      624MiB |

FYI: I ended up using a distributed lock to pin the GPU computation to one executor/process id. THis was the outcome derived from the comment made from @Jan.

like image 560
Matt Andruff Avatar asked Dec 28 '25 21:12

Matt Andruff


1 Answers

According to https://discuss.pytorch.org/t/pytorch-do-not-clear-gpu-memory-when-return-to-another-function/125944/3 this is due to the CUDA context remaining in place unless you end the script. They recommend calling torch.cuda.empty_cache() to clear the cache, however, there will always be a remainder. To get rid of that you could switch to processes instead of threads so that the process can actually be killed without killing your programm (but that'll be quite some effort I suppose).

like image 173
Jan Avatar answered Dec 30 '25 10:12

Jan