I'm trying to develop a deconvolutional layer (or a transposed convolutional layer to be precise).
In the forward pass, I do a full convolution (convolution with zero padding) In the backward pass, I do a valid convolution (convolution without padding) to pass the errors to the previous layer
The gradients of the biases are easy to compute, simply a matter of averaging over the superfluous dimensions.
The problem is I don't know how to update the weights of the convolutional filters. What are the gradients ? I'm sure it is a convolution operation but I don't see how. I tried a valid convolution of the inputs with the errors but to no avail.
In normal neural network, in the first layer we have data, random weights and bias term. This then pass through hidden layer, then out output layer, we then calculate the error, and finally based on error we again update the weights. This looping is done continuously till we get satisfactory error rate.
In the Downsampling network, simple CNN architectures are used and abstract representations of the input image are produced. In the Upsampling network, the abstract image representations are upsampled using various techniques to make their spatial dimensions equal to the input image.
The Upsampling layer is a simple layer with no weights that will double the dimensions of input and can be used in a generative model when followed by a traditional convolutional layer.
To convert one set of values to a higher set of values. For example, HDTV sets upconvert broadcast TV (480i) and DVD content (480i or 480p) to the highest format the set supports (720p, 1080i or 1080p). A/V receivers also provide upconversion.
First of all, deconvolution is a convolutional layer, only used for a different purpose, namely upsampling (why it's useful is explained in this paper).
For example, here a 2x2
input image (bottom image in blue) is upsampled to 4x4
(top image in green):
To make it a valid convolution, the input is first padded to make it 6x6
, after which 3x3
filter is applied without striding. Just like in ordinary convolutional layer, you can choose different padding/striding strategies to produce the image size you want.
Now it should be clear that backward pass for deconvolution is a partial case of backward pass for a convolutional layer, with particular stride and padding. I think you've done it already, but here's a naive (and not very efficient) implementation for any stride and padding:
# input: x, w, b, stride, pad, d_out
# output: dx, dw, db <- gradients with respect to x, w, and b
N, C, H, W = x.shape
F, C, HH, WW = w.shape
N, C, H_out, W_out = d_out.shape
x_pad = np.pad(x, pad_width=((0, 0), (0, 0), (pad, pad), (pad, pad)), mode='constant', constant_values=0)
db = np.sum(d_out, axis=(0, 2, 3))
dw = np.zeros_like(w)
dx = np.zeros_like(x_pad)
for n in xrange(N):
for f in xrange(F):
filter_w = w[f, :, :, :]
for out_i, i in enumerate(xrange(0, H, stride)):
for out_j, j in enumerate(xrange(0, W, stride)):
dw[f, :, :, :] += d_out[n, f , out_i, out_j] * x_pad[n, :, i:i+HH, j:j+WW]
dx[n, :, i:i+HH, j:j+WW] += filter_w * d_out[n, f, out_i, out_j]
dx = dx[:,:,1:H+1,1:W+1]
The same can be done more efficiently using im2col
and col2im
, but it's just an implementation detail. Another funny fact: the backward pass for a convolution operation (for both the data and the weights) is again a convolution, but with spatially-flipped filters.
Here's how it's applied (plain simple SGD):
# backward_msg is the message from the next layer, usually ReLu
# conv_cache holds (x, w, b, conv_params), i.e. the info from the forward pass
backward_msg, dW, db = conv_backward(backward_msg, conv_cache)
w = w - learning_rate * dW
b = b - learning_rate * db
As you can see, it's pretty straightforward, just need to understand that you're applying same old convolution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With