Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Out of memory running Tensorflow with GPU support in PyCharm

My code works fine when running in iPython terminal, but failed with out of memory error, as below.

/home/abigail/anaconda3/envs/tf_gpuenv/bin/python -Xms1280m -Xmx4g /home/abigail/PycharmProjects/MLNN/src/test.py
Using TensorFlow backend.
Epoch 1/150
2019-01-19 22:12:39.539156: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-19 22:12:39.588899: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-19 22:12:39.589541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.0845
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 59.69MiB
2019-01-19 22:12:39.589552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
Traceback (most recent call last):
  File "/home/abigail/PycharmProjects/MLNN/src/test.py", line 20, in <module>
    model.fit(X, Y, epochs=150, batch_size=10)
  File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
    outs = f(ins_batch)
  File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2697, in __call__
    if hasattr(get_session(), '_make_callable_from_options'):
  File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 186, in get_session
    _SESSION = tf.Session(config=config)
  File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
    self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

Process finished with exit code 1

In PyCharm, I first edited the "Help->Edit Custom VM options":

-Xms1280m
-Xmx4g

This doesn't fix the issue. Then I edited "Run->Edit Configurations->Interpreter options":

-Xms1280m -Xmx4g

It still gives the same error. My desktop Linux has enough memory (64G). How to fix this issue?

BTW, in PyCharm if I don't use GPU, it doesn't give the error.

EDIT:

In [5]: exit                                                                                                                                                                                                                                                                                                                    
(tf_gpuenv) abigail@abigail-XPS-8910:~/nlp/MLMastery/DLwithPython/code/chapter_07$ nvidia-smi
Sun Jan 20 00:41:49 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.25       Driver Version: 415.25       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 00000000:01:00.0  On |                  N/A |
| 38%   54C    P0     2W /  38W |   1707MiB /  1993MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       770      G   /usr/bin/akonadi_archivemail_agent             2MiB |
|    0       772      G   /usr/bin/akonadi_sendlater_agent               2MiB |
|    0       774      G   /usr/bin/akonadi_mailfilter_agent              2MiB |
|    0      1088      G   /usr/lib/xorg/Xorg                           166MiB |
|    0      1440      G   kwin_x11                                      60MiB |
|    0      1446      G   /usr/bin/krunner                               1MiB |
|    0      1449      G   /usr/bin/plasmashell                          60MiB |
|    0      1665      G   ...quest-channel-token=3687002912233960986   137MiB |
|    0     20728      C   ...ail/anaconda3/envs/tf_gpuenv/bin/python  1255MiB |
+-----------------------------------------------------------------------------+
like image 220
ling Avatar asked Jan 26 '23 20:01

ling


1 Answers

To wrap up our conversation as per the comments, I'm do not believe that you can allocate GPU memory or desktop memory to the GPU - not in the way that you are trying to. When you have a single GPU, Tensorflow-GPU in most cases will allocate around 95% of the available memory to the task it runs. In your case, Something is already consuming all of the available GPU memory which is the primary reason your program does not run. You need to review the memory usage of your GPU and free up some memory (I can't help but to think you already have another instance python using Tensorflow GPU running in the background or some other intensive GPU program). In Linux the command nvidia-smi on the command line will tell you what uses your GPU here is an example

Sun Jan 20 18:23:35 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 970     Off  | 00000000:01:00.0 Off |                  N/A |
| 32%   63C    P2    69W / 163W |   3823MiB /  4035MiB |     40%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      3019      C   ...e/scarter/anaconda3/envs/tf1/bin/python  3812MiB |
+-----------------------------------------------------------------------------+

You can see in my case, that my card on my server has 4035MB or RAM, 3823MB is being used. Further more, review GPU process at the bottom. Process PID 3019 consumes 3812MB of the available 4035MB on the card. If We wanted to run another python script using tensorflow, I have 2 main choices, I can either install a second GPU and run on the second GPU or if no GPU is available, then run on the CPU. Someone more expert than me may say that you could allocate just half the memory to each task, but 2Gig of memory is already pretty low for tensorflow training. Typically cards with much more memeory (6 gig +) is recommended for that task.
In closing, find out what is consuming all of your Video cards memory and end that task. I believe it will resolve your problem.

like image 200
IamSierraCharlie Avatar answered Jan 31 '23 07:01

IamSierraCharlie