Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory usage of tensorflow conv2d with large filters

I have a tensorflow model with some relatively large 135 x 135 x 1 x 3 convolution filters. I find that tf.nn.conv2d becomes unusable for such large filters - it attempts to use well over 60GB of memory, at which point I need to kill it. Here is the minimum script to reproduce my error:

import tensorflow as tf
import numpy as np

frames, height, width, channels = 200, 321, 481, 1
filter_h, filter_w, filter_out = 5, 5, 3  # With this, output has shape (200, 317, 477, 3)
# filter_h, filter_w, filter_out = 7, 7, 3  # With this, output has shape (200, 315, 475, 3)
# filter_h, filter_w, filter_out = 135, 135, 3  # With this, output will be smaller than the above with shape (200, 187, 347, 3), but memory usage explodes

images = np.random.randn(frames, height, width, channels).astype(np.float32)

filters = tf.Variable(np.random.randn(filter_h, filter_w, channels, filter_out).astype(np.float32))
images_input = tf.placeholder(tf.float32)
conv = tf.nn.conv2d(images_input, filters, strides=[1, 1, 1, 1], padding="VALID")

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    result = sess.run(conv, feed_dict={images_input: images})

print result.shape

First, can anyone explain this behavior? Why does memory usage blow up with filter size? (Note: I also tried changing my dimensions around to use a single conv3d instead of a batch of conv2ds, but this had the same problem)

Second, can anyone suggest a solution other than, say, breaking the operation up into 200 separate single-image convolutions?

Edit: After re-reading the docs on tf.nn.conv2d(), I noticed this in the explanation of how it works:

  1. Flattens the filter to a 2-D matrix with shape [filter_height * filter_width * in_channels, output_channels].
  2. Extracts image patches from the input tensor to form a virtual tensor of shape [batch, out_height, out_width, filter_height * filter_width * in_channels].
  3. For each patch, right-multiplies the filter matrix and the image patch vector.

I had originally taken this simply as a description of the process, but if tensorflow is actually extracting and storing separate filter-sized 'patches' from the image under the hood, then a back-of-the-envelope calculation shows that the intermediate computation involved requires ~130GB in my case, well over the limit that I could test.. This might answer my first question, but if so can anyone explain why TF would do this when I'm still only debugging on a CPU?

like image 921
wrongu Avatar asked Aug 30 '17 21:08

wrongu


1 Answers

I had originally taken this simply as a description of the process, but if tensorflow is actually extracting and storing separate filter-sized 'patches' from the image under the hood, then a back-of-the-envelope calculation shows that the intermediate computation involved requires ~130GB in my case, well over the limit that I could test.

As you figured out yourself, this is the reason for the large memory consumption. Tensorflow does this because the filters are usually small and calculating a matrix multiplication is a lot faster than calculating a convolution.

can anyone explain why TF would do this when I'm still only debugging on a CPU?

You can also use tensorflow without having a GPU, therefore the CPU implementations are not just there for debugging. They are also optimized for speed and matrix multiplication is faster on both CPU and GPU.

To make convolutions with large filters possible you would have to implement a convolution for large filters in C++ and add it as a new op to tensorflow.

like image 152
BlueSun Avatar answered Nov 04 '22 17:11

BlueSun