Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convolution Neural Network classifier without Fully Connected Layers

I am working on a project to detect following classes {cars, trucks, buses} then extract the respective license plate.

This question is about the detection of respective classes. I have used the traditional method where I used HOG features with linear SVM and it works but with low accuracy. I am trying to look into CNN for deep learning based detection which has shown higher accuracy. Papers like R-CNN is extremely show and I completely understand how it works.

Recently the YOLO model is shown a very fast detection which is quite interesting. If I can guess correctly, then YOLO is roughly similar to DPM.

Generally, YOLO has 24 convolutional layers and 2 fully connected layers. NVIDIA DIGITS implements a DetectNet based on the this YOLO paper. What I am confused is that DetectNet by NVIDIA does not have any Fully-Connected Layers (Caffe Model File). Instead output from the last convolutional layer is passed through a dimensional reducing convolutional layers which I think outputs some confidence in having an object.

Question 1

But I dont understand how a convolutional layers replaces FC-Layers and learns to predict the object? Detail explaination on this will be very helpful.

like image 212
kcc__ Avatar asked Nov 08 '22 05:11

kcc__


1 Answers

Question : can you do Convolution on an image for classification with a Fully connected layer

The simple answer : Yes. We don't need to use the Dense layer in tensorflow or keras. But... what does that really mean? How important is that? Let's look at some code that does MNIST classification with using the Denise layer

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPool2D, InputLayer, Reshape

# get some image data for classification
(xtrain,ytrain),(xtest,ytest) = tf.keras.datasets.mnist.load_data()
xtrain = np.reshape(xtrain,[-1,28,28,1]) / 255.0
ytrain = np.eye(10)[ytrain]
xtest = np.reshape(xtest,[-1,28,28,1]) / 255.0
ytest = np.eye(10)[ytest]

# make a convolution model with any dense or fully connected layers
model = tf.keras.models.Sequential([
    InputLayer([28,28,1]),
    Conv2D(filters=16, kernel_size=3, activation='tanh', padding='valid', kernel_initializer='he_normal'),
    Conv2D(filters=16, kernel_size=3, activation='tanh', padding='valid', kernel_initializer='he_normal'),
    MaxPool2D(pool_size=2),
    Conv2D(filters=24, kernel_size=3, activation='tanh', padding='valid', kernel_initializer='he_normal'),
    Conv2D(filters=24, kernel_size=3, activation='tanh', padding='valid', kernel_initializer='he_normal'),
    MaxPool2D(pool_size=2),
    Conv2D(filters=32, kernel_size=4, activation='tanh', padding='valid', kernel_initializer='he_normal'),
    Conv2D(filters=10, kernel_size=1, activation='softmax', padding='valid', kernel_initializer='he_normal'),
    Reshape([10])
    ])

model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
_ = model.fit(x=xtrain,y=ytrain, validation_data=(xtest,ytest))

It will classify MNIST after 1 epoch with this result

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 16)        160       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 16)        2320      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 16)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 10, 10, 24)        3480      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 8, 8, 24)          5208      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 4, 4, 24)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 1, 1, 32)          12320     
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 1, 1, 10)          330       
_________________________________________________________________
reshape (Reshape)            (None, 10)                0         
=================================================================
Total params: 23,818
Trainable params: 23,818
Non-trainable params: 0
_________________________________________________________________

60000/60000 [==============================] - 28s 467us/sample - loss: 0.1709 - acc: 0.9543 - val_loss: 0.0553 - val_acc: 0.9838

The accuracy isn't great, but certainly well above random. We can see from the model definition not a single fully connected layer ( tf.keras.layers.Dense ) was used.

BUT, the layer conv2d_4 which is Conv2D(filters=32, kernel_size=4, ... layer is effectively doing the same operation that Flatten() followed by Dense(32, ...) would do.

Then conv2d_5 which is Conv2D(filters=10, kernel_size=1, ... is effectively doing the same operation as Dense(10, ...) would do. The key difference is that in the above model these operations use the convolution framework. It looks cool but under the covers when the kernel_size is the same as the whole height x width, its computation is identical to a fully connected layer.

Technically the answer is, no dense layer was used. In the spirit of acknowledging the underlying computation, yes the final layers act like fully connected layers.

like image 184
Anton Codes Avatar answered Dec 10 '22 10:12

Anton Codes