My code works fine when running in iPython terminal, but failed with out of memory error, as below.
/home/abigail/anaconda3/envs/tf_gpuenv/bin/python -Xms1280m -Xmx4g /home/abigail/PycharmProjects/MLNN/src/test.py
Using TensorFlow backend.
Epoch 1/150
2019-01-19 22:12:39.539156: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-19 22:12:39.588899: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-19 22:12:39.589541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.0845
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 59.69MiB
2019-01-19 22:12:39.589552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
Traceback (most recent call last):
File "/home/abigail/PycharmProjects/MLNN/src/test.py", line 20, in <module>
model.fit(X, Y, epochs=150, batch_size=10)
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2697, in __call__
if hasattr(get_session(), '_make_callable_from_options'):
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 186, in get_session
_SESSION = tf.Session(config=config)
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
Process finished with exit code 1
In PyCharm, I first edited the "Help->Edit Custom VM options":
-Xms1280m
-Xmx4g
This doesn't fix the issue. Then I edited "Run->Edit Configurations->Interpreter options":
-Xms1280m -Xmx4g
It still gives the same error. My desktop Linux has enough memory (64G). How to fix this issue?
BTW, in PyCharm if I don't use GPU, it doesn't give the error.
EDIT:
In [5]: exit
(tf_gpuenv) abigail@abigail-XPS-8910:~/nlp/MLMastery/DLwithPython/code/chapter_07$ nvidia-smi
Sun Jan 20 00:41:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.25 Driver Version: 415.25 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti Off | 00000000:01:00.0 On | N/A |
| 38% 54C P0 2W / 38W | 1707MiB / 1993MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 770 G /usr/bin/akonadi_archivemail_agent 2MiB |
| 0 772 G /usr/bin/akonadi_sendlater_agent 2MiB |
| 0 774 G /usr/bin/akonadi_mailfilter_agent 2MiB |
| 0 1088 G /usr/lib/xorg/Xorg 166MiB |
| 0 1440 G kwin_x11 60MiB |
| 0 1446 G /usr/bin/krunner 1MiB |
| 0 1449 G /usr/bin/plasmashell 60MiB |
| 0 1665 G ...quest-channel-token=3687002912233960986 137MiB |
| 0 20728 C ...ail/anaconda3/envs/tf_gpuenv/bin/python 1255MiB |
+-----------------------------------------------------------------------------+
To wrap up our conversation as per the comments, I'm do not believe that you can allocate GPU memory or desktop memory to the GPU - not in the way that you are trying to. When you have a single GPU, Tensorflow-GPU in most cases will allocate around 95% of the available memory to the task it runs. In your case, Something is already consuming all of the available GPU memory which is the primary reason your program does not run. You need to review the memory usage of your GPU and free up some memory (I can't help but to think you already have another instance python using Tensorflow GPU running in the background or some other intensive GPU program). In Linux the command nvidia-smi
on the command line will tell you what uses your GPU
here is an example
Sun Jan 20 18:23:35 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970 Off | 00000000:01:00.0 Off | N/A |
| 32% 63C P2 69W / 163W | 3823MiB / 4035MiB | 40% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3019 C ...e/scarter/anaconda3/envs/tf1/bin/python 3812MiB |
+-----------------------------------------------------------------------------+
You can see in my case, that my card on my server has 4035MB or RAM, 3823MB is being used. Further more, review GPU process at the bottom. Process PID 3019 consumes 3812MB of the available 4035MB on the card. If We wanted to run another python script using tensorflow, I have 2 main choices, I can either install a second GPU and run on the second GPU or if no GPU is available, then run on the CPU. Someone more expert than me may say that you could allocate just half the memory to each task, but 2Gig of memory is already pretty low for tensorflow training. Typically cards with much more memeory (6 gig +) is recommended for that task.
In closing, find out what is consuming all of your Video cards memory and end that task. I believe it will resolve your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With