I have a 32Gb graphics card and upon start of my script I see:
2019-07-11 01:26:19.985367: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 95.16G (102174818304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:19.988090: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 85.64G (91957338112 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:19.990806: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 77.08G (82761605120 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:19.993527: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 69.37G (74485440512 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:19.996219: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 62.43G (67036893184 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:19.998911: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 56.19G (60333203456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:20.001601: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 50.57G (54299881472 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:20.004296: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 45.51G (48869892096 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:20.006981: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 40.96G (43982901248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:20.009660: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 36.87G (39584608256 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2019-07-11 01:26:20.012341: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 33.18G (35626147840 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
After which TF settles with using 96% of my memory. And later, when it runs out of memory it tries to allocate 65G
tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 65.30G (70111285248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
My question is, what about remaining 1300MB (0.04*32480)? I would not mind using those before running OOM.
How can I make TF utilize 99.9% of memory instead of 96%?
Update: nvidia-smi output
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:16.0 Off | 0 |
| N/A 66C P0 293W / 300W | 31274MiB / 32480MiB | 100% Default |
I am asking about these 1205MB (31274MiB - 32480MiB) remaining unused. Maybe they are there for a reason, maybe they are used just before OOM.
Software requirementsNVIDIA® GPU drivers version 450.80.02 or higher. CUDA® Toolkit 11.2.
In a single clock cycle, enable tensorflow for GPU computation which can carry a lot of data(compared to CPU) for calculation, doing training a lot faster and allowing for better memory management.
Monitoring GPU is not as simple as monitoring CPU.
There are many parallel processes going on which could create a bottleneck
for your GPU.
There could be various problems like :
1. Read/Write speed for your data
2. Either CPU or disk is causing a bottleneck
But I think it is pretty normal to use 96%. Not to mention nvidia-smi only shows for one specific instance.
You can install gpustat
and use it to monitor GPU live(you should be hitting 100% during OOM)
pip install gpustat
gpustat -i
What can you do ?
1. You can use data_iterator to process the data in parallel faster.
2. Increase batch size. (I dont think this will work in your case as you are hitting OOM
)
3. You can overclock the GPU(not-recommended)
Here is a nice article for hardware accelaration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With