When doing benchmarks of CNNs I found out that the most time is spent in the fully-connected layers. But when it comes to calculate the computational complexity I found out that:
O(conv) = N*(D * (W+P) * (H+P) * h *w)/S
O(fully_connected) = D*W*H*N
Where
D = Dimensions Input
W,w = Width Input, width Filter
H, h = Height Input, height Filter
S = Stride
P = Padding
N = number of outputs
For an example, I have a 1024x11x11 feature map input DxWxH
, a 5x5 filter h,w
without padding p
, and with the Stride S of 1
, and the number of outputs N shall be 512
This results in the following calculation for the convolution:
O(conv) = 512*(1024*11*11*5*5)/1 = 1 585 971 200
If the same input is used for a fully connected layer, and the desired output is still 512 then:
O(fully_connected) = 512*1024*11*11 = 63 438 848
Is this due to the more advanced methods for parallesing the convolutional layers on a GPU and the conv layer has more operations but less computation time cause of parallism issues? Or is my way of calculating the complexity of each layers simply wrong?
You can check if it is only the implementation by converting the fully-connected connections to equivalent convolutions. For every fully connected layer, there is an equivalent convolutional layer (see my question for details and examples).
c
channels of size w × h
(hence the shape c × w × h
) followed by a fully-connected layer with n
nodes.(c ⋅ w ⋅ h) × 1 × 1
.n
filters of size 1 × 1
.Now check the time. If it is faster than the fully connected layer, then it is simply due to a better implementation of convolution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With