Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this Keras model require over 6GB of memory?

This Keras model seems to require 6GB+ of RAM using the Tensorflow backend. My back-of-the-envelope math suggests that storing the weights shouldn't require more than 500MB. What's going on?

from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D

IMAGE_SIZE = 128
print('Build model...')
model = Sequential()
# three color channels, 128x128
# 16 con filters, 3 rows, 3 columns
model.add(Convolution2D(16, 3, 3, input_shape=(3, IMAGE_SIZE, IMAGE_SIZE)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(1))
model.add(Dense(3 * IMAGE_SIZE * IMAGE_SIZE))


model.compile(loss='mse', optimizer='sgd')

It's a convolution layer (16 3x3 filters) connected to a single neuron, and then that single neuron is connected to ~50k neurons.

I'm pretty new to Keras, so I imagine my misunderstanding is pretty fundamental, but I can't seem to figure it out.

like image 385
Ryan Marcus Avatar asked Mar 02 '16 20:03

Ryan Marcus


People also ask

How is keras model size calculated?

By applying this formula to the first Conv2D layer (i.e., conv2d ), we can calculate the number of parameters using 32 * (1 * 3 * 3 + 1) = 320, which is consistent with the model summary. The input channel number is 1, because the input data shape is 28 x 28 x 1 and the number 1 is the input channel.

What is the use of keras model?

Keras is a neural network Application Programming Interface (API) for Python that is tightly integrated with TensorFlow, which is used to build machine learning models. Keras' models offer a simple, user-friendly way to define a neural network, which will then be built for you by TensorFlow.


1 Answers

Turns out, my issue was including a path to CUDA 7.5 in my LD_CONFIG_PATH, but including a path to CUDA 7.0 in PATH. Apparently this awkward combination spawns some undefined behavior, which in my case produced a memory leak.

After examining the code with a valgrind, I found that the nvcc from 7.0 was essentially jumping into nonsense areas of the CUDA (7.5) library, which is not unexpected. It's actually pretty amazing it leaked memory instead of just crashing, and that Theano had the same error.

Hopefully no one else will be plagued by this particular issue in the future, but if you are, double check your version paths!

On my local machine, without a GPU'd Tensorflow installed, I still got the memory leak, which appeared to a bug in the previous (0.7.0) version that has been resolved with the (0.7.1) release. Again, I haven't figured out why my non-GPU Theano backend also produced the leak, but after upgrading Tensorflow, the Theano backend doesn't leak either. It's a very strange thing, but I believe the general solution to this problem is "upgrade" and "double-check your env".

like image 58
Ryan Marcus Avatar answered Oct 05 '22 03:10

Ryan Marcus