Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorFlow allocating large amounts of main memory at session startup time

Consider the following two line Python/TensorFlow interactive session:

import tensorflow as tf
s=tf.Session()

If these commands are executed on an Ubuntu Linux 14.04 machine, using Anaconda Python 2.7.13 and TensorFlow r1.3 (compiled from sources), with 32G physical memory and 2 GPUs (a GTX Titan X and a GTX 970) while CUDA_VISIBLE_DEVICES is not set (i.e. both GPUs are visible) the resulting python process has 59.7G of memory allocated! Note that it only actually uses 754M.

If CUDA_VISIBLE_DEVICES=0 (i.e. only the Titan X is visible) then 55.2G is allocated and 137M is in use.

If CUDA_VISIBLE_DEVICES=1 (i.e. only the 970 is visible) then 47.0G is allocated and 325M is in use.

If CUDA_VISIBLE_DEVICES= (i.e. neither GPU is visible) then only 2.5G is allocated and only 131M is in use.

This is a problem in environments where the amount of allocated memory is constrained, e.g. inside a grid engine setup.

Is there any way to limit the amount of main memory that TensorFlow allocates when it is using CUDA?

Update 1

The amount of memory allocated is determined, in these trials, by looking at the VIRT column in htop.

TensorFlow r1.3 is compiled with mostly default configure answers. The only variations are the paths to CUDA and cuDNN. As a result, jemalloc is being used.

Update 2

I've tried recompiling with jemalloc disabled and see the same behaviour.

like image 375
Daniel Renshaw Avatar asked Sep 08 '17 10:09

Daniel Renshaw


1 Answers

The default behavior of TensorFlow on GPU is to use all the memory available. However, if you want to avoid this behavior, you can specify to the session to dynamically allocate the memory.

From the ConfigProto declaration :

// allow_growth
// If true, the allocator does not pre-allocate the entire specified
// GPU memory region, instead starting small and growing as needed.

In order to do this, pass a ConfigProto object to your session when creating it :

session_config = tf.ConfigProto()
session_config.gpu_options.allow_growth=True
sess = tf.Session(config=session_config)

If you want to limit the amount of memory used, it's up to your batch size and the number of parameters in your model.

like image 158
Lescurel Avatar answered Oct 19 '22 08:10

Lescurel