What's the best way to measure details GPU memory usages in Tensorflow

Tags:

I'm trying to use tensorflow profile to measure the detail GPU memory usage such as conv1 activations, weights, etc. I tried to use TF profile. It reported 4000MB peak usage. But at the same time, I measured using nvidia-smi, which reported 10000MB usage. They have big difference and I don't know what's the root cause. Can anyone give some suggestions to proceed?

TF profile:

enter image description here

nvidia-smi:

enter image description here

Tensorflow version: 1.9.0

272

asked Jan 03 '20 18:01

user5473110

1 Answers

First, TF would always allocate most if not all available GPU memory when it starts. It actually allows TF to use memory more effectively. To change this behavior one might want to set an environment flag export TF_FORCE_GPU_ALLOW_GROWTH=true. More options are available here.

Once you've done that, nvidia-smi would still report exaggerated memory usage numbers. Because TF nvidia-smi reports allocated memory, while profiler reports actual peak memory being in use.

BFC is used as memory allocator. Whenever TF runs out of, say, 4GB of memory it would allocate twice the amount of 8GB. Next time it would try to allocate 16GB. At the same time the program might only use 9GB of memory on pick, but 16GB allocation would be reported by nvidia-smi. Also, BFC is not the only thing that allocates GPU memory in tensorflow, so, it can actually use 9GB+something.

Another comment here would be, that tensorflow native tools for reporting memory usage were not particularly precise in the past. So, I would allow myself to say, that profiler might actually be somewhat underestimating peak memory usage.

Here is some info on memory management https://github.com/miglopst/cs263_spring2018/wiki/Memory-management-for-tensorflow

Another a bit advanced link for checking memory usage: https://github.com/yaroslavvb/memory_util

answered Sep 30 '22 15:09

y.selivonchyk

Related questions
                            
                                Cannot Connect from Kafkacat running in docker to Kafka broker running locally on windows machine
                            
                                Multiple dropdowns implementation with one main dropdown
                            
                                How to mock a static getter's return value in Dart or Flutter?
                            
                                9patch AAPT: error: file failed to compile
                            
                                Docker remote daemon (TCP): Cannot connect to the Docker daemon
                            
                                How to evaluate coroutine in pycharms interactive debugger
                            
                                Pupeteer - how can I accept cookie consent prompts automatically for any URL?
                            
                                done() vs return done()
                            
                                Logging console output using Qt Installer Framework scripting
                            
                                pygame.error: Failed loading libmpg123.dll: Attempt to access invalid address
                            
                                Cypress tests failing because Chrome Renderer is crashing in CI (using drone)
                            
                                Chromium v79 graphics rendering freezes unpredictably on Ubuntu 18

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With