Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to choose the window size of CNN in deep learning?

In Convolutional Neural Network (CNN), a filter is select for weights sharing. For example, in the following pictures, a 3x3 window with the stride (distance between adjacent neurons) 1 is chosen.

So my question is: How to choose the window size? If I use 4x4 with the stride being 2, how much difference will it cause? Thanks a lot in advance!

like image 704
Fitz999 Avatar asked Oct 31 '17 06:10

Fitz999


People also ask

What is window size in CNN?

In CNN architectures, pooling is typically performed with 2x2 windows, stride 2 and no padding. While convolution is done with 3x3 windows, stride 1 and with padding.

What is window size in deep learning?

Window size, as I know it, is the length of a (sliding) cutout of a time sequence of data. E.g., if you have data x(t) that you want to model, you could use a k-size window x(n), x(n+1), ..., x(n+k). This is a method commonly used in non-recursive approximators.

How is CNN size calculated?

Machine Learning (ML) cnn In short, the answer is as follows: Output height = (Input height + padding height top + padding height bottom - kernel height) / (stride height) + 1. Output width = (Output width + padding width right + padding width left - kernel width) / (stride width) + 1.


1 Answers

There's no definite answer to this: filter size is one of hyperparameters you generally need to tune. However, there're some useful observations, that may help you. It's often preferred to choose smaller filters, but have greater number of those.

Example: four 5x5 filters have 100 parameters (ignoring bias), while 10 3x3 filters have 90 parameters. Through the larger of filters you still can capture the variety of features in the image, but with fewer parameters. More on this here.

Modern CNNs go even further with this idea and choose consecutive 3x1 and 1x3 convolutional layers. This reduces the number of parameters even more, but doesn't affect the performance. See the evolution of inception network.

The choice of stride is also important, but it affects the tensor shape after the convolution, hence the whole network. The general rule is to use stride=1 in usual convolutions and preserve the spatial size with padding, and use stride=2 when you want to downsample the image.

like image 117
Maxim Avatar answered Oct 09 '22 14:10

Maxim