I am trying to develop a model for denoising images. I've been reading up on how to calculate memory usage of a neural network and the standard approach seems to be:
params = depth_n x (kernel_width x kernel_height) x depth_n-1 + depth
By summing all parameters together in my network, I end up getting 1,038,097 which approximates to 4.2MB. It seems I have done a slight miscalculation in the last layer since Keras ends up getting 1,038,497 params. Nevertheless, this is a small difference. 4.2MB is just the parameters, and I've seen somewhere that one should multiply by 3 to include backprop and other needed calculations. This would then approximate to 13MB.
I have approximately 11 GB of GPU memory to work with, yet this model gets exhausted. Where does all the extra needed memory come from? What am I missing? I know this post might be labeled as duplicate, but none of the others seems to catch the topic which I am asking about.
My model:
def network(self):
weights = RandomUniform(minval=-0.05, maxval=0.05, seed=None)
input_img = Input(shape=(self.img_rows, self.img_cols, self.channels))
conv1 = Conv2D(1024, (3,3), activation='tanh', kernel_initializer=weights,
padding='same', use_bias=True)(input_img)
conv2 = Conv2D(64, (3,3), activation='tanh', kernel_initializer=weights,
padding='same', use_bias=True)(conv1)
conv3 = Conv2D(64, (3,3), activation='tanh', kernel_initializer=weights,
padding='same', use_bias=True)(conv2)
conv4 = Conv2D(64, (3,3), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv3)
conv5 = Conv2D(64, (7,7), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv4)
conv6 = Conv2D(64, (5,5), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv5)
conv7 = Conv2D(32, (5,5), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv6)
conv8 = Conv2D(32, (3,3), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv7)
conv9 = Conv2D(16, (3,3), activation='relu', kernel_initializer=weights,
padding='same', use_bias=True)(conv8)
decoded = Conv2D(1, (5,5), kernel_initializer=weights,
padding='same', activation='sigmoid', use_bias=True)(conv8)
return input_img, decoded
def compiler(self):
self.model.compile(optimizer='RMSprop', loss='mse')
self.model.summary()
I assume my model is silly in a lot of ways and that there are multiple things to improve (dropout, other filter sizes and numbers, optimizers etc.) and all suggestions are received gladly, but the actual question still remain. Why does this model consume so much memory? Is it due to the extremely high depth of conv1
?
Model summary:
Using TensorFlow backend.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 1751, 480, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 1751, 480, 1024) 10240
_________________________________________________________________
conv2d_2 (Conv2D) (None, 1751, 480, 64) 589888
_________________________________________________________________
conv2d_3 (Conv2D) (None, 1751, 480, 64) 36928
_________________________________________________________________
conv2d_4 (Conv2D) (None, 1751, 480, 64) 36928
_________________________________________________________________
conv2d_5 (Conv2D) (None, 1751, 480, 64) 200768
_________________________________________________________________
conv2d_6 (Conv2D) (None, 1751, 480, 64) 102464
_________________________________________________________________
conv2d_7 (Conv2D) (None, 1751, 480, 32) 51232
_________________________________________________________________
conv2d_8 (Conv2D) (None, 1751, 480, 32) 9248
_________________________________________________________________
conv2d_10 (Conv2D) (None, 1751, 480, 1) 801
=================================================================
Total params: 1,038,497
Trainable params: 1,038,497
Non-trainable params: 0
_________________________________________________________________
You are correct, this is due to the number of filters in conv1
. What you must compute is the memory required to store the activations:
As shown by your model.summary()
, the output size of this layer is (None, 1751, 480, 1024)
. For a single image, this is a total of 1751*480*1024
pixels. As your image is likely in float32
, each pixel takes 4 bytes to store. So the output of this layer requires 1751*480*1024*4
bytes, which is around 3.2 GB per image just for this layer.
If you were to change the number of filters to, say, 64, you would only need around 200 MB per image.
Either change the number of filters or change the batch size to 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With