Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why the LeNet5 uses 32×32 image as input?

I know that the handwritten digit images in the mnist dataset are 28×28,but why the input in LeNet5 is 32×32?

like image 801
xiaofei Avatar asked Feb 15 '15 11:02

xiaofei


People also ask

What is the size of the input image for a standard lenet5 architecture?

The input to this model is a 32 X 32 grayscale image hence the number of channels is one. We then apply the first convolution operation with the filter size 5X5 and we have 6 such filters.

What is meant by LeNet-5?

in 1998, . In general, LeNet refers to LeNet-5 and is a simple convolutional neural network. Convolutional neural networks are a kind of feed-forward neural network whose artificial neurons can respond to a part of the surrounding cells in the coverage range and perform well in large-scale image processing.

How many layers does LeNet-5 have?

LeNet-5 CNN architecture is made up of 7 layers. The layer composition consists of 3 convolutional layers, 2 subsampling layers and 2 fully connected layers.

Which neural network model was using lenet5 as base model and helped in booming of deep learning?

LeNet-5 was one among the earliest convolutional neural networks which promoted the event of deep learning. After innumerous years of analysis and plenty of compelling iterations, the end result was named LeNet-5 in 1988.


1 Answers

Your question is answered in the original paper:
The convolution step always takes a smaller input than the feature maps of the previous layer (and this holds true for the 1st layer - the input - as well):

Layer C1 is a convolutional layer with 6 feature maps. Each unit in each feature map is connected to a 5x5 neighborhood in the input. The size of the feature maps is 28x28 which prevents connection from the input from falling off the boundary.

This means that using a 5x5 neighborhood on a 32x32 input, you'll get 6 features maps of size 28x28 because there's pixels you won't use at the image boundary (you will always have a remainder with these numbers).

Of course they could have an exception for the first layer. The reason they're still using 32x32 images is:

The input is a 32x32 pixel image. This is significantly larger than the largest character in the database (at most 20x20 pixels centered in a 28x28 field). The reason is that it is desirable that potential distinctive features such as stroke end-points or corner can appear in the center of the receptive field of the highest-level feature detectors.

like image 89
runDOSrun Avatar answered Oct 05 '22 11:10

runDOSrun