Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the memory in GPU still in use after clearing the object?

Starting with zero usage:

>>> import gc
>>> import GPUtil
>>> import torch
>>> GPUtil.showUtilization()
| ID | GPU | MEM |
------------------
|  0 |  0% |  0% |
|  1 |  0% |  0% |
|  2 |  0% |  0% |
|  3 |  0% |  0% |

Then I create a big enough tensor and hog the memory:

>>> x = torch.rand(10000,300,200).cuda()
>>> GPUtil.showUtilization()
| ID | GPU | MEM |
------------------
|  0 |  0% | 26% |
|  1 |  0% |  0% |
|  2 |  0% |  0% |
|  3 |  0% |  0% |

Then I tried several ways to see if the tensor disappears.

Attempt 1: Detach, send to CPU and overwrite the variable

No, doesn't work.

>>> x = x.detach().cpu()
>>> GPUtil.showUtilization()
| ID | GPU | MEM |
------------------
|  0 |  0% | 26% |
|  1 |  0% |  0% |
|  2 |  0% |  0% |
|  3 |  0% |  0% |

Attempt 2: Delete the variable

No, this doesn't work either

>>> del x
>>> GPUtil.showUtilization()
| ID | GPU | MEM |
------------------
|  0 |  0% | 26% |
|  1 |  0% |  0% |
|  2 |  0% |  0% |
|  3 |  0% |  0% |

Attempt 3: Use the torch.cuda.empty_cache() function

Seems to work, but it seems that there are some lingering overheads...

>>> torch.cuda.empty_cache()
>>> GPUtil.showUtilization()
| ID | GPU | MEM |
------------------
|  0 |  0% |  5% |
|  1 |  0% |  0% |
|  2 |  0% |  0% |
|  3 |  0% |  0% |

Attempt 4: Maybe clear the garbage collector.

No, 5% is still being hogged

>>> gc.collect()
0
>>> GPUtil.showUtilization()
| ID | GPU | MEM |
------------------
|  0 |  0% |  5% |
|  1 |  0% |  0% |
|  2 |  0% |  0% |
|  3 |  0% |  0% |

Attempt 5: Try deleting torch altogether (as if that would work when del x didn't work -_- )

No, it doesn't...*

>>> del torch
>>> GPUtil.showUtilization()
| ID | GPU | MEM |
------------------
|  0 |  0% |  5% |
|  1 |  0% |  0% |
|  2 |  0% |  0% |
|  3 |  0% |  0% |

And then I tried to check gc.get_objects() and it looks like there's still quite a lot of odd THCTensor stuff inside...

Any idea why is the memory still in use after clearing the cache?

like image 218
alvas Avatar asked Aug 14 '19 14:08

alvas


People also ask

How does PyTorch allocate memory?

Memory management. PyTorch uses a caching memory allocator to speed up memory allocations. This allows fast memory deallocation without device synchronizations. However, the unused memory managed by the allocator will still show as if used in nvidia-smi .

How does GC collect work?

It performs a blocking garbage collection of all generations. All objects, regardless of how long they have been in memory, are considered for collection; however, objects that are referenced in managed code are not collected. Use this method to force the system to try to reclaim the maximum amount of available memory.

What does GC collect do python?

The garbage collector is keeping track of all objects in memory. A new object starts its life in the first generation of the garbage collector. If Python executes a garbage collection process on a generation and an object survives, it moves up into a second, older generation.

Why can’t I clear GPU memory allocated for evaluation?

I found that the GPU memory allocated for evaluation couldn’t be cleared (about 4G memory being occupied as illustated in the figure). This is expected, since the memory would be in the cache assuming that you have deleted all tensors. If not delete objects, which are not needed anymore.

What happens if you don’t have enough memory on your graphics card?

Not having enough memory on your graphics card limits the resolution size, textures, shadows, and other graphics settings. Let’s use a simple analogy to help better understand how graphics cards work. Your GPU is like the engine in a car, and the graphics card memory is the passenger space.

Does GPU usage and memory usage affect gaming performance?

GPU usage and memory is not related. Depending on the game, the memory would not be loaded to the max, only the amount that the game needed. Low memory usage is not the indication of slow memory.

Why is my GPU usage so high on AMD cards?

AMD cards with this issue have a program called Radeon Settings Host Service that has high GPU usage. There could be other third-party programs that are using your GPU. To check, open the task manager and look at the applications tab for any programs that are using the GPU. You can tap the GPU column to sort the applications by GPU usage.


2 Answers

It looks like PyTorch's caching allocator reserves some fixed amount of memory even if there are no tensors, and this allocation is triggered by the first CUDA memory access (torch.cuda.empty_cache() deletes unused tensor from the cache, but the cache itself still uses some memory).

Even with a tiny 1-element tensor, after del and torch.cuda.empty_cache(), GPUtil.showUtilization(all=True) reports exactly the same amount of GPU memory used as for a huge tensor (and both torch.cuda.memory_cached() and torch.cuda.memory_allocated() return zero).

like image 129
Sergii Dymchenko Avatar answered Oct 17 '22 14:10

Sergii Dymchenko


From the PyTorch docs:

Memory management

PyTorch uses a caching memory allocator to speed up memory allocations. This allows fast memory deallocation without device synchronizations. However, the unused memory managed by the allocator will still show as if used in nvidia-smi. You can use memory_allocated() and max_memory_allocated() to monitor memory occupied by tensors, and use memory_cached() and max_memory_cached() to monitor memory managed by the caching allocator. Calling empty_cache() releases all unused cached memory from PyTorch so that those can be used by other GPU applications. However, the occupied GPU memory by tensors will not be freed so it can not increase the amount of GPU memory available for PyTorch.

I bolded a part mentioning nvidia-smi, which as far as I know is used by GPUtil.

like image 29
Stanowczo Avatar answered Oct 17 '22 15:10

Stanowczo