Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update the weights of a Deconvolutional Layer?

I'm trying to develop a deconvolutional layer (or a transposed convolutional layer to be precise).

In the forward pass, I do a full convolution (convolution with zero padding) In the backward pass, I do a valid convolution (convolution without padding) to pass the errors to the previous layer

The gradients of the biases are easy to compute, simply a matter of averaging over the superfluous dimensions.

The problem is I don't know how to update the weights of the convolutional filters. What are the gradients ? I'm sure it is a convolution operation but I don't see how. I tried a valid convolution of the inputs with the errors but to no avail.

like image 556
Baptiste Wicht Avatar asked Jan 17 '17 14:01

Baptiste Wicht


People also ask

How are weights updated in CNN?

In normal neural network, in the first layer we have data, random weights and bias term. This then pass through hidden layer, then out output layer, we then calculate the error, and finally based on error we again update the weights. This looping is done continuously till we get satisfactory error rate.

How is upsampling done in CNN?

In the Downsampling network, simple CNN architectures are used and abstract representations of the input image are produced. In the Upsampling network, the abstract image representations are upsampled using various techniques to make their spatial dimensions equal to the input image.

What is upsample layer?

The Upsampling layer is a simple layer with no weights that will double the dimensions of input and can be used in a generative model when followed by a traditional convolutional layer.

What is Upconv?

To convert one set of values to a higher set of values. For example, HDTV sets upconvert broadcast TV (480i) and DVD content (480i or 480p) to the highest format the set supports (720p, 1080i or 1080p). A/V receivers also provide upconversion.


1 Answers

Deconvolution explained

First of all, deconvolution is a convolutional layer, only used for a different purpose, namely upsampling (why it's useful is explained in this paper).

For example, here a 2x2 input image (bottom image in blue) is upsampled to 4x4 (top image in green):

deconvolution

To make it a valid convolution, the input is first padded to make it 6x6, after which 3x3 filter is applied without striding. Just like in ordinary convolutional layer, you can choose different padding/striding strategies to produce the image size you want.

Backward pass

Now it should be clear that backward pass for deconvolution is a partial case of backward pass for a convolutional layer, with particular stride and padding. I think you've done it already, but here's a naive (and not very efficient) implementation for any stride and padding:

# input: x, w, b, stride, pad, d_out
# output: dx, dw, db <- gradients with respect to x, w, and b

N, C, H, W = x.shape
F, C, HH, WW = w.shape
N, C, H_out, W_out = d_out.shape

x_pad = np.pad(x, pad_width=((0, 0), (0, 0), (pad, pad), (pad, pad)), mode='constant', constant_values=0)

db = np.sum(d_out, axis=(0, 2, 3))

dw = np.zeros_like(w)
dx = np.zeros_like(x_pad)
for n in xrange(N):
  for f in xrange(F):
    filter_w = w[f, :, :, :]
    for out_i, i in enumerate(xrange(0, H, stride)):
      for out_j, j in enumerate(xrange(0, W, stride)):
        dw[f, :, :, :] += d_out[n, f , out_i, out_j] * x_pad[n, :, i:i+HH, j:j+WW]
        dx[n, :, i:i+HH, j:j+WW] += filter_w * d_out[n, f, out_i, out_j]
dx = dx[:,:,1:H+1,1:W+1]

The same can be done more efficiently using im2col and col2im, but it's just an implementation detail. Another funny fact: the backward pass for a convolution operation (for both the data and the weights) is again a convolution, but with spatially-flipped filters.

Here's how it's applied (plain simple SGD):

# backward_msg is the message from the next layer, usually ReLu
# conv_cache holds (x, w, b, conv_params), i.e. the info from the forward pass
backward_msg, dW, db = conv_backward(backward_msg, conv_cache)
w = w - learning_rate * dW
b = b - learning_rate * db

As you can see, it's pretty straightforward, just need to understand that you're applying same old convolution.

like image 110
Maxim Avatar answered Oct 22 '22 06:10

Maxim