Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do Convolutional Layers (CNNs) work in keras?

I notice that in the keras documentation there are many different types of Conv layers, i.e. Conv1D, Conv2D, Conv3D.

All of them have parameters like filters, kernel_size, strides, and padding, which aren't present in other keras layers.

I have seen images like this which "visualize" Conv layers,

enter image description here

but I don't understand what's going on in the transition from one layer to the next.

How do changing the above parameters and the dimensions of our Conv layer affect what happens in the model?

like image 504
Primusa Avatar asked Feb 16 '19 21:02

Primusa


1 Answers

Convolutions - Language Agnostic Basics

To understand how convolutions work in keras we need a basic understanding of how convolutions work in a language-agnostic setting.

enter image description here

Convolutional layers slide across an input to build an activation map (also called feature map). The above is an example of a 2D convolution. Notice how at each step, the 3 X 3 dark square slides across the input (blue), and for each new 3 x 3 part of the input it analyzes, it outputs a value in our output activation map (blue-green boxes at the top).

Kernels and filters

The dark square is our kernel. The kernel is a matrix of weights that is multiplied with each part of our input. All the results from these multiplications put together form our activation map.

Intuitively, our kernel lets us reuse parameters - a weights matrix that detects an eye in this part of the image will work to detect it elsewhere; there's no point in training different parameters for each part of our input when one kernel can sweep across and work everywhere. We can consider each kernel as a feature-detector for one feature, and it's output activation map as a map of how likely that feature is present in each part of your input.

A synonym for kernel is filter. The parameter filters is asking for the number of kernels (feature-detectors) in that Conv layer. This number will also be the the size of the last dimension in your output, i.e. filters=10 will result in an output shape of (???, 10). This is because the output of each Conv layer is a set of activation maps, and there will be filters number of activation maps.

Kernel Size

The kernel_size is well, the size for each kernel. We discussed earlier that each kernel consists of a weights matrix that is tuned to detect certain features better and better. kernel_size dictates the size of filter mask. In English, how much "input" is processed during each convolution. For example our above diagram processes a 3 x 3 chunk of the input each time. Thus, it has a kernel_size of (3, 3). We can also call the above operation a "3x3 convolution"

Larger kernel sizes are almost unconstrained in the features they represent, while smaller ones are restricted to specific low-level features. Note though that multiple layers of small kernel sizes can emulate the effect of a larger kernel size.

Strides

Notice how our above kernel shifts two units each time. The amount that the kernel "shifts" for each computation is called strides, so in keras speak our strides=2. Generally speaking, as we increase the number of strides, our model loses more information from one layer to the next, due to the activation map having "gaps".

Padding

Going back to the above diagram, notice the ring of white squares surrounding our input. This is our padding. Without padding, each time we pass our input through a Conv layer, the shape of our result gets smaller and smaller. As a result we pad our input with a ring of zeros, which serves a few purposes:

  1. Preserve information around the edges. From our diagram notice how each corner white square only goes through the convolution once, while center squares go through four times. Adding padding alleviates this problem - squares originally on the edge get convolved more times.

  2. padding is a way to control the shape of our output. We can make shapes easier to work with by keeping the output of each Conv layer have the same shape as our input, and we can make deeper models when our shape doesn't get smaller each time we use a Conv layer.

keras provides three different types of padding. The explanations in the docs are very straightforward so they are copied / paraphrased here. These are passed in with padding=..., i.e. padding="valid".

valid: no padding

same: padding the input such that the output has the same length as the original input

causal: results in causal (dialated convolutions). Normally in the above diagram the "center" of the kernel maps to the value in the output activation map. With causal convolutions the right edge is used instead. This is useful for temporal data, where you don't want to use future data to model present data.

Conv1D, Conv2D, and Conv3D

Intuitively the operations that occur on these layers remain the same. Each kernel still slides across your input, each filter outputs an activation map for it's own feature, and the padding is still applied.

The difference is the number of dimensions that are convolved. For example in Conv1D a 1D kernel slides across one axis. In Conv2D a 2D kernel slides across a two axes.

It is very important to note that the D in an X-D Conv layer doesn't denote the number of dimensions of the input, but rather the number of axes that the kernel slides across.

enter image description here

For example, in the above diagram, even though the input is 3D (image with RGB channels), this is an example of a Conv2D layer. This is because there are two spatial dimensions - (rows, cols), and the filter only slides along those two dimensions. You can consider this as being convolutional in the spatial dimensions and fully connected in the channels dimension.

The output for each filter is also two dimensional. This is because each filter slides in two dimensions, creating a two dimensional output. As a result you can also think of an N-D Conv as each filter outputting an N-D vector.

enter image description here

You can see the same thing with Conv1D (pictured above). While the input is two dimensional, the filter only slides along one axis, making this a 1D convolution.

In keras, this means that ConvND will requires each sample to have N+1 dimensions - N dimensions for the filter to slide across and one additional channels dimension.

TLDR - Keras wrap up

filters: The amount of different kernels in the layer. Each kernel detects and outputs an activation map for a specific feature, making this the last value in the output shape. I.e. Conv1D outputs (batch, steps, filters).

kernel_size: Determines the dimensions of each kernel / filter / feature-detector. Also determines how much of the input is used to calculate each value in the output. Larger size = detecting more complex features, less constraints; however it's prone to overfitting.

strides: How many units you move to take the next convolution. Bigger strides = more information loss.

padding: Either "valid", "causal", or "same". Determines if and how to pad the input with zeros.

1D vs 2D vs 3D: Denotes the number of axes that the kernel slides across. An N-D Conv layer will output an N-D output for each filter, but will require an N+1 dimensional input for each sample. This is composed from N dimensions to side across plus one additional channels dimension.

References:

Intuitive understanding of 1D, 2D, and 3D Convolutions in Convolutional Neural Networks

https://keras.io/layers/convolutional/

http://cs231n.github.io/convolutional-networks/

like image 162
Primusa Avatar answered Oct 17 '22 06:10

Primusa