When I use nvidia-smi, I found nearly 20GB GPU Memory is missing somewhere (total listed processes took 17745MB, meanwhile Memory-Usage is 37739MB):

Then I use nvitop, you can see No Such Process has actually taken my GPU resources. However, I cannot kill this PID:
>>> sudo kill -9 118238
kill: (118238): No such process

How can I get rid of this ghost process without interupting others?
I have found the solution in this answer: https://stackoverflow.com/a/59431785/6563277.
First, I run sudo fuser -v /dev/nvidia* to see all processes are using my GPU RAM that nvidia-smi has failed to show.
Then, I saw some "ghost" Python processes. And after killing it, the GPU RAM was free up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With