Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret TensorFlow output?

How do I interpret the TensorFlow output for building and executing computational graphs on GPGPUs?

Given the following command that executes an arbitrary tensorflow script using the python API.

python3 tensorflow_test.py > out

The first part stream_executor seems like its loading dependencies.

I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally 

What is a NUMA node?

I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 

I assume this is when it finds the available GPU

I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:  name: Tesla K40c major: 3 minor: 5 memoryClockRate (GHz) 0.745 pciBusID 0000:01:00.0 Total memory: 11.25GiB Free memory: 11.15GiB 

Some gpu initialization? what is DMA?

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0  I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y  I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:01:00.0) 

Why does it throw an error E?

E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 11.15G (11976531968 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY 

Great answer to what the pool_allocator does: https://stackoverflow.com/a/35166985/4233809

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 3160 get requests, put_count=2958 evicted_count=1000 eviction_rate=0.338066 and unsatisfied allocation rate=0.412025 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1743 get requests, put_count=1970 evicted_count=1000 eviction_rate=0.507614 and unsatisfied allocation rate=0.456684 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 256 to 281 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1986 get requests, put_count=2519 evicted_count=1000 eviction_rate=0.396983 and unsatisfied allocation rate=0.264854 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 655 to 720 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 28728 get requests, put_count=28680 evicted_count=1000 eviction_rate=0.0348675 and unsatisfied allocation rate=0.0418407 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 1694 to 1863 
like image 735
Alexander R Johansen Avatar asked Apr 25 '16 11:04

Alexander R Johansen


People also ask

What is the output of TensorFlow?

tensorflow::Output Represents a tensor value produced by an Operation.

How do you print a TensorFlow model summary?

Call model. summary() to print a useful summary of the model, which includes: Name and type of all layers in the model. Output shape for each layer.


2 Answers

About NUMA -- https://software.intel.com/en-us/articles/optimizing-applications-for-numa

Roughly speaking, if you have dual-socket CPU, they will each have their own memory and have to access the other processor's memory through a slower QPI link. So each CPU+memory is a NUMA node.

Potentially you could treat two different NUMA nodes as two different devices and structure your network to optimize for different within-node/between-node bandwidth

However, I don't think there's enough wiring in TF right now to do this right now. The detection doesn't work either -- I just tried on a machine with 2 NUMA nodes, and it still printed the same message and initialized to 1 NUMA node.

DMA = Direct Memory Access. You could potentially copy things from one GPU to another GPU without utilizing CPU (ie, through NVlink). NVLink integration isn't there yet.

As far as the error, TensorFlow tries to allocate close to GPU max memory so it sounds like some of your GPU memory is already been allocated to something else and the allocation failed.

You can do something like below to avoid allocating so much memory

config = tf.ConfigProto(log_device_placement=True) config.gpu_options.per_process_gpu_memory_fraction=0.3 # don't hog all vRAM config.operation_timeout_in_ms=15000   # terminate on long hangs sess = tf.InteractiveSession("", config=config) 
like image 148
Yaroslav Bulatov Avatar answered Oct 10 '22 01:10

Yaroslav Bulatov


  • successfully opened CUDA library xxx locally means that the library was loaded, but it does not meant that it will be used.
  • successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero means that your kernel does not have NUMA support. You can read about NUMA here and here.
  • Found device 0 with properties: you have 1 GPU which you can use. It lists the properties of this GPU.
  • DMA is direct memory access. More information on Wikipedia.
  • failed to allocate 11.15G the error clearly explains why this happened, but it is hard to tell why do you need so much memory without looking at the code.
  • pool allocator messages are explained in this answer
like image 34
Salvador Dali Avatar answered Oct 10 '22 02:10

Salvador Dali