I have a 32Gb graphics card and upon start of my script I see:
2019-07-11 01:26:19.985367: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 95.16G (102174818304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:19.988090: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 85.64G (91957338112 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:19.990806: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 77.08G (82761605120 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:19.993527: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 69.37G (74485440512 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:19.996219: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 62.43G (67036893184 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:19.998911: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 56.19G (60333203456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:20.001601: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 50.57G (54299881472 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:20.004296: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 45.51G (48869892096 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:20.006981: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 40.96G (43982901248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:20.009660: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 36.87G (39584608256 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:20.012341: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 33.18G (35626147840 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
After which TF settles with using 96% of my memory. And later, when it runs out of memory it tries to allocate 65G
tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 65.30G (70111285248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
My question is, what about remaining 1300MB (0.04*32480)? I would not mind using those before running OOM.
How can I make TF utilize 99.9% of memory instead of 96%?
Update: nvidia-smi output
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:16.0 Off | 0 |
| N/A 66C P0 293W / 300W | 31274MiB / 32480MiB | 100% Default |
I am asking about these 1205MB (31274MiB - 32480MiB) remaining unused. Maybe they are there for a reason, maybe they are used just before OOM.
Software requirementsNVIDIA® GPU drivers version 450.80.02 or higher. CUDA® Toolkit 11.2.
In a single clock cycle, enable tensorflow for GPU computation which can carry a lot of data(compared to CPU) for calculation, doing training a lot faster and allowing for better memory management.
Monitoring GPU is not as simple as monitoring CPU.
There are many parallel processes going on which could create a bottleneck for your GPU.
There could be various problems like :
1. Read/Write speed for your data
2. Either CPU or disk is causing a bottleneck
But I think it is pretty normal to use 96%. Not to mention nvidia-smi only shows for one specific instance.
You can install gpustat and use it to monitor GPU live(you should be hitting 100% during OOM)
pip install gpustat
gpustat -i
What can you do ?
1. You can use data_iterator to process the data in parallel faster.
2. Increase batch size. (I dont think this will work in your case as you are hitting OOM)
3. You can overclock the GPU(not-recommended)
Here is a nice article for hardware accelaration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With