Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Call tf.Session() twice causes fatal error: failed to get device attribute 13 for device 0

I just installed Tensor Flow 1.14.0 with CUDA 10.0.130 and cudnn v7.6.1.34. It works well when I call tf.Session() at the first time in one python session, but when I tried to call it again it breaks down even when I closed the first session.

The smallest example reproduces this fault is as follow

(tensorflow-gpu) C:\Users\Argen>python
Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> a = tf.Session()
2019-07-20 12:04:23.279225: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2019-07-20 12:04:23.912859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce 940M major: 5 minor: 0 memoryClockRate(GHz): 1.176
pciBusID: 0000:01:00.0
2019-07-20 12:04:23.921996: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-07-20 12:04:23.927364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-20 12:04:23.931103: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-07-20 12:04:23.938320: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce 940M major: 5 minor: 0 memoryClockRate(GHz): 1.176
pciBusID: 0000:01:00.0
2019-07-20 12:04:23.944323: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-07-20 12:04:23.950175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-20 12:04:26.671775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-20 12:04:26.678254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-07-20 12:04:26.681610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-07-20 12:04:26.686087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1391 MB memory) -> physical GPU (device: 0, name: GeForce 940M, pci bus id: 0000:01:00.0, compute capability: 5.0)
>>> a.close()
>>> a = tf.Session()
2019-07-20 12:06:57.801849: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

My environment is: Win 10 Professional Intel(R) HD graphics 520 & NVIDIA GeForce 940M Python 3.7.3

like image 340
Yuzheng Ding Avatar asked Nov 06 '22 15:11

Yuzheng Ding


1 Answers

By default TensorFlow allocates GPU memory for the lifetime of the process, not the lifetime of the session object. More details at: https://www.tensorflow.org/programmers_guide/using_gpu#allowing_gpu_memory_growth

Thus, if you want memory to be freed, you'll have to exit the Python interpreter, not just close the session.

Hope that helps.

like image 150
Tensorflow Support Avatar answered Nov 14 '22 23:11

Tensorflow Support