Conv1D(filters=N, kernel_size=K) versus Dense(output_dim=N) layer

Tags:

I have an input tensor T of size [batch_size=B, sequence_length=L, dim=K]. Is applying a 1D convolution of N filters and kernel size K the same as applying a dense layer with output dimension of N?

For example in Keras:

Conv1D(filters=N, kernel_size=K)

Dense(units=N)

Note for Conv1D, I reshape the tensor T to [batch_size*sequence_length, dim=K, 1] to perform the convolution.

Both result in learnable weights of 20,480 + 256 (bias). Yet using Conv1D learns much faster initially for me. I don't see how Dense() is any different in this case and I'd like to use Dense() method in order have the lower vram consumption as well as not reshape tensor.

Follow up clarification:

The two answers provided two different ways to perform the 1D convolution. How are the following methods different?:

Method 1:

- Reshape input to [batch_size * frames, frame_len]
- convolve with Conv1D(filters=num_basis, kernel_size=frame_len)
- Reshape the output of the convolution layer to [batch_size, frames, num_basis]

Method 2:

- Convolve with Conv1D(filters=num_basis, kernel_size=1) on Input=[batch_size, frames, frame_len]. No input reshaping.
- No need to reshape output, it's already [batch_size, frames, num_basis]

My understanding is that it's the same operation (they have the same #parameters). However, I'm getting faster convergence with method 1.

292

asked Jan 22 '19 10:01

Artash

1 Answers

To achieve the same behaviour as a Dense layer using a Conv1d layer, you need to make sure that any output neuron from the Conv1d is connected to every input neuron.

For an input of size [batch_size, L, K], your Conv1d needs to have a kernel of size L and as many filters as you want outputs neurons. To understand why, let's go back to the definition of a 1d convolution or temporal convolution.

The Conv1d layer’s parameters consist of a set of learnable filters. Every filter is usually small temporally and extends through the full depth of the input volume. For example, in your problem, a typical filter might have size 5xK (i.e. 5 steps of your sequence, and K because your input have depth K). During the forward pass, we slide (more precisely, convolve) each filter across the different steps of the input volume's sequence and compute dot products between the entries of the filter and the input at any position. As we slide the filter, we will produce a 1-dimensional activation map that gives the responses of that filter at every spatial position.

Now, if your filters are of size LxK, you can easily see that you will have only one possible spatial position (as the filter is the same size as the sequence) that will be the dot product between the full input volume and the weights LxK for each filter. The different filters composing your Conv1d now behave the same as the units composing a Dense layer: they are fully connected to your input.

You can verify this behaviour with the following code:

import tensorflow as tf
import numpy as np

l = 10
k = 2
n = 5

x = tf.placeholder(tf.float32, [None, l, k])
c = tf.layers.conv1d(inputs=x, strides=1, filters=n, kernel_size=l, kernel_initializer=tf.ones_initializer())
d = tf.layers.dense(inputs=tf.reshape(x, [-1, l*k]), units=n, kernel_initializer=tf.ones_initializer())

batch_size = 10

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    r_conv, r_dense = sess.run([c, d], {x: np.random.normal(size=[batch_size, l, k])})

print(r_conv.shape, r_dense.shape)
#(10, 1, 5) (10, 5)

print(np.allclose(r_conv.reshape([batch_size, -1]), r_dense.reshape([batch_size, -1])))
#True

For the same initialization, the outputs are indeed equal.

Regarding speed, I suppose that one of the main reason the Conv1d was faster and took more VRAM was because of your reshape: you were virtually increasing your batch size, improving parallelization at the cost of memory.

Edit after follow up clarification:

Maybe I misunderstood your question. Method 1 and Method 2 are the same but they are not the same as applying a Dense layer to the Input=[B, LxK].

Here, your outputs are connected to the full dimension K and then the same weights are used for every time step of your sequence meaning that both method are only fully connected to the frame but not the sequence. This is equivalent to a Dense layer on [BxL, K] indeed.

You can verify this behaviour with the following code:

l = 10
k = 2
n = 5

x = tf.placeholder(tf.float32, [None, l, k])
c2 = tf.layers.conv1d(inputs=x, strides=1, filters=n, kernel_size=1, kernel_initializer=tf.ones_initializer())
c3 = tf.layers.conv1d(inputs=tf.reshape(x, [-1, k, 1]), strides=1, filters=n, kernel_size=k, kernel_initializer=tf.ones_initializer())
d2 = tf.layers.dense(inputs=tf.reshape(x, [-1, k]), units=n, kernel_initializer=tf.ones_initializer())

batch_size = 10

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    r_d2, r_c2, r_c3 = sess.run([d2, c2, c3], {x: np.random.normal(size=[batch_size, l, k])})
    r_d2 = r_d2.reshape([10, 10, 5])
    r_c3 = r_c3.reshape([10, 10, 5])

print(r_d2.shape, r_c2.shape, r_c3.shape)
#(10, 10, 5) (10, 10, 5) (10, 10, 5)

print(np.allclose(r_d2, r_c2))
#True
print(np.allclose(r_d2, r_c3))
#True
print(np.allclose(r_c2, r_c3))
#True

Concerning the speed, it must be because there is only one dot product in Method 1 to compute the result whereas you have L in Method 2 + other operations.

answered Sep 30 '22 10:09

Olivier Dehaene

Related questions
                            
                                Update a subset of weights in TensorFlow
                            
                                Tensorflow, train_step feed incorrect
                            
                                TensorFlow check which protobuf implementation is being used
                            
                                Tensorflow text summarization setup : What is a workspace file?
                            
                                Change constant in tensoflow session while looping
                            
                                How to check NaN in gradients in Tensorflow when updating?
                            
                                Issue with setting TensorFlow as the session in Keras
                            
                                WARNING:tensorflow - initialize_all_variables (from tensorflow.python.ops.variables) is deprecated
                            
                                'Resource exhausted' memory error when trying to train a Keras model
                            
                                Why is the value of a `tf.constant()` stored multiple times in memory in TensorFlow?
                            
                                TensorFlow: Is there a metric to calculate and update top k accuracy?
                            
                                Building a CMake library within a Bazel project
                            
                                Tensorflow Estimator API save image summary in eval mode
                            
                                Tensorflow Combining Two Models End to End
                            
                                AttributeError: 'InputLayer' object has no attribute 'inbound_nodes'
                            
                                How do I create a Keras Embedding layer from a pre-trained word embedding dataset?
                            
                                How to initialize variables defined in tensorflow function?
                            
                                How can I limit regression output between 0 to 1 in keras
                            
                                How to use He initialization in TensorFlow
                            
                                Check TPU workload/utilization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Conv1D(filters=N, kernel_size=K) versus Dense(output_dim=N) layer

Tags:

tensorflow

deep-learning

keras

convolution

Artash

People also ask

1 Answers

Olivier Dehaene

Recent Activity

Donate For Us