In AlexNet,the image data is 3*224*224.
The first convolutional layer filters the image with 96 kernels of size 11*11*3 with a stride of 4 piexels.
I have doubt with the first layer's output neurons count.
In my opinion,the input is 224*224*3=150528,then the output should be 55*55*96=290400
But in the paper,they described the output is 253440
How to calculate the number of this layer's neurons?
It seems like the input size is 227x227, without padding. I also think that what they mention in the paper is a mistake. Have look at this link.
http://cs231n.github.io/convolutional-networks/
It mentions:
The Krizhevsky et al. architecture that won the ImageNet challenge in 2012 accepted images of size [227x227x3]. On the first Convolutional Layer, it used neurons with receptive field size F=11, stride S=4 and no zero padding P=0. Since (227 - 11)/4 + 1 = 55, and since the Conv layer had a depth of K=96, the Conv layer output volume had size [55x55x96]. Each of the 555596 neurons in this volume was connected to a region of size [11x11x3] in the input volume. Moreover, all 96 neurons in each depth column are connected to the same [11x11x3] region of the input, but of course with different weights. As a fun aside, if you read the actual paper it claims that the input images were 224x224, which is surely incorrect because (224 - 11)/4 + 1 is quite clearly not an integer. This has confused many people in the history of ConvNets and little is known about what happened. My own best guess is that Alex used zero-padding of 3 extra pixels that he does not mention in the paper.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With