Tensorflow tends to preallocate the entire available memory on it's GPUs. For debugging, is there a way of telling how much of that memory is actually in use?

(1) There is some limited support with Timeline for logging memory allocations. Here is an example for its usage: <pre class="prettyprint"><code> run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) run_metadata = tf.RunMetadata() summary, _ = sess.run([merged, train_step], feed_dict=feed_dict(True), options=run_options, run_metadata=run_metadata) train_writer.add_run_metadata(run_metadata, 'step%03d' % i) train_writer.add_summary(summary, i) print('Adding run metadata for', i) tl = timeline.Timeline(run_metadata.step_stats) print(tl.generate_chrome_trace_format(show_memory=True)) trace_file = tf.gfile.Open(name='timeline', mode='w') trace_file.write(tl.generate_chrome_trace_format(show_memory=True)) </code></pre> You can give this code a try with the MNIST example (mnist with summaries) This will generate a tracing file named timeline, which you can open with chrome://tracing. Note that this only gives an approximated GPU memory usage statistics. It basically simulated a GPU execution, but doesn't have access to the full graph metadata. It also can't know how many variables have been assigned to the GPU. (2) For a very coarse measure of GPU memory usage, nvidia-smi will show the total device memory usage at the time you run the command. nvprof can show the on-chip shared memory usage and register usage at the CUDA kernel level, but doesn't show the global/device memory usage. Here is an example command: nvprof --print-gpu-trace matrixMul And more details here: http://docs.nvidia.com/cuda/profiler-users-guide/#abstract

Is there a way of determining how much GPU memory is in use by TensorFlow?

Q: How does TensorFlow allocate GPU memory?

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES ) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation. To limit TensorFlow to a specific set of GPUs, use the tf.

Q: How to test GPU memory usage in TensorFlow?

Maybe tensorflow will decide to store the gradients, then you have to take into account the memory usage of it also. The way I do it is by setting the GPU memory limit to a high value e.g. 1GB, then test the model inference speed.

Q: Is it possible to separate GPU memory usage from model usage?

But the GPU memory usage cannot be fully separated according to the model loaded as part of the GPU memory usage are cost by stuff like CUDA context, which is shared among loaded models.

Q: Why does my GPU usage go up and down?

This is a common situation we see — here the system memory is significantly used and the memory usage seems to be gradually increasing. As the memory usage goes up the GPU usage goes down. We also often see network being the bottleneck when people try to train on datasets that aren’t available locally.

2 Answers

(1) There is some limited support with Timeline for logging memory allocations. Here is an example for its usage:

    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)     run_metadata = tf.RunMetadata()     summary, _ = sess.run([merged, train_step],                           feed_dict=feed_dict(True),                           options=run_options,                           run_metadata=run_metadata)     train_writer.add_run_metadata(run_metadata, 'step%03d' % i)     train_writer.add_summary(summary, i)     print('Adding run metadata for', i)     tl = timeline.Timeline(run_metadata.step_stats)     print(tl.generate_chrome_trace_format(show_memory=True))     trace_file = tf.gfile.Open(name='timeline', mode='w')     trace_file.write(tl.generate_chrome_trace_format(show_memory=True))

You can give this code a try with the MNIST example (mnist with summaries)

This will generate a tracing file named timeline, which you can open with chrome://tracing. Note that this only gives an approximated GPU memory usage statistics. It basically simulated a GPU execution, but doesn't have access to the full graph metadata. It also can't know how many variables have been assigned to the GPU.

(2) For a very coarse measure of GPU memory usage, nvidia-smi will show the total device memory usage at the time you run the command.

nvprof can show the on-chip shared memory usage and register usage at the CUDA kernel level, but doesn't show the global/device memory usage.

Here is an example command: nvprof --print-gpu-trace matrixMul

And more details here: http://docs.nvidia.com/cuda/profiler-users-guide/#abstract

107

answered Sep 22 '22 23:09

Yao Zhang

Here's a practical solution that worked well for me:

Disable GPU memory pre-allocation using TF session configuration:

config = tf.ConfigProto()   config.gpu_options.allow_growth=True   sess = tf.Session(config=config)

run nvidia-smi -l (or some other utility) to monitor GPU memory consumption.

Step through your code with the debugger until you see the unexpected GPU memory consumption.

answered Sep 20 '22 23:09

eitanrich

Related questions
                            
                                The variation of cache misses in GPU
                            
                                nvidia-smi Failed to initialize NVML: GPU access blocked by the operating system
                            
                                Low GPU usage by Keras / Tensorflow?
                            
                                Stack guard might have disabled?
                            
                                Is it possible to access hard disk directly from gpu?
                            
                                what is difference between "-arch sm_13" and "-arch sm_20"
                            
                                GPU shared memory size is very small - what can I do about it?
                            
                                How to install Keras with gpu support?
                            
                                How to select batch size automatically to fit GPU?
                            
                                AMD equivalent to NvOptimusEnablement
                            
                                Convert video with ffmpeg using gpu
                            
                                Compiling an OpenCL program using a CL/cl.h file
                            
                                Tensorflow Deep MNIST: Resource exhausted: OOM when allocating tensor with shape[10000,32,28,28]
                            
                                help me understand cuda
                            
                                GPU accelerated math in the browser
                            
                                How is CUDA memory managed?
                            
                                Tensorflow multiple sessions with multiple GPUs
                            
                                Using CUDA with Visual Studio 2017
                            
                                Why can't libcudart.so.4 be found when compiling the CUDA samples under Ubuntu?
                            
                                Synchronization between command buffers in Vulkan

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a way of determining how much GPU memory is in use by TensorFlow?

Tags:

tensorflow

gpu

Maarten

People also ask

2 Answers

Yao Zhang

eitanrich

Recent Activity

Donate For Us