Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the number of parameters of AlexNet?

i haven't found a calculation of parameters (weights + biases) of AlexNet so I tried to calculate it, but I'm not sure if its correct:

conv1: (11*11)*3*96 + 96 = 34944

conv2: (5*5)*96*256 + 256 = 614656

conv3: (3*3)*256*384 + 384 = 885120

conv4: (3*3)*384*384 + 384 = 1327488

conv5: (3*3)*384*256 + 256 = 884992

fc1: (6*6)*256*4096 + 4096 = 37752832

fc2: 4096*4096 + 4096 = 16781312

fc3: 4096*1000 + 1000 = 4097000

this results in a total amount of 62378344 parameters. Is that calculation right?

like image 567
Tobi Avatar asked Oct 15 '16 15:10

Tobi


People also ask

How many parameters does AlexNet have?

Overall, AlexNet has about 660K units, 61M parameters, and over 600M connections. Notice: the convolutional layers comprise most of the units and connections, but the fully connected layers are responsible for most of the weights.

How do you calculate parameters?

To calculate the learnable parameters here, all we have to do is just multiply the by the shape of width m, height n, previous layer's filters d and account for all such filters k in the current layer. Don't forget the bias term for each of the filter.

How many trainable parameters are in AlexNet?

The Alexnet has eight layers with learnable parameters. The model consists of five layers with a combination of max pooling followed by 3 fully connected layers and they use Relu activation in each of these layers except the output layer.


3 Answers

Your calculations are correct. We came up with the exact same number independently while writing this blog post. I have also added the final table from the post

enter image description here

like image 141
Satya Mallick Avatar answered Sep 21 '22 14:09

Satya Mallick


Slide 8 in this presentation states it has 60M parameters, so I think you're at least in the ball park. http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf

like image 26
Alex Klibisz Avatar answered Sep 22 '22 14:09

Alex Klibisz


According to the diagram in their paper, some of the layers use grouping. Therefore, not all features of one layer communicate with the next. This means e.g. for conv2, you should have only (5*5)*48*256 + 256 = 307,456 features.

I'm not sure if all newer implementations include the grouping. It was an optimization they used to let the network train in parallel on two GPUs, but modern GPUs have more resources for training and fit the network comfortably without grouping.

like image 28
Sven Zwei Avatar answered Sep 22 '22 14:09

Sven Zwei