Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Convolutional Neural Network possess localization abilities on images?

As far as I know, CNN rely on sliding window techniques and can only indicate if a certain pattern is present or not anywhere in given bounding boxes. Is that true?

Can one achieve localization with CNN without any help of such techniques?

like image 476
Barış Geçer Avatar asked Jan 27 '15 19:01

Barış Geçer


People also ask

What is convolutional neural network (CNN)?

The Convolutional Neural Network (CNN or ConvNet) is a subtype of the Neural Networks that is mainly used for applications in image and speech recognition. Its built-in convolutional layer reduces the high dimensionality of images without losing its information.

What is a convolution layer in neural networks?

The convolutional layer is the first layer of a convolutional network. While convolutional layers can be followed by additional convolutional layers or pooling layers, the fully-connected layer is the final layer. With each layer, the CNN increases in its complexity, identifying greater portions of the image.

How to introduce non-linearity in convolutional neural network?

To introduce non-linearity into the system and improve the learning capacity, the output from the convolution operation is passed through a non-saturating activation function like sigmoid or rectified linear unit (ReLU). Check out this excellent article about these and several other commonly used activation functions.

How does convolution work in image processing?

The reading of the matrix then begins, for which the software selects a smaller image, known as the ‘filter’ (or kernel). The depth of the filter is the same as the depth of the input. The filter then produces a convolution movement along with the input image, moving right along the image by 1 unit.


2 Answers

There are some recent techniques to localize the objects in CNN's. See this paper http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf

It uses a layer called Global Average Pooling (GAP), and with no additional work, the CNN can localize the object it recognizes.

Also checkout this really good blog post: https://alexisbcook.github.io/2017/global-average-pooling-layers-for-object-localization/

like image 36
Hossein Avatar answered Sep 22 '22 05:09

Hossein


Thats an open problem in image recognition. Besides sliding windows, existing approaches include predicting object location in image as CNN output, predicting borders (classifiyng pixels as belonging to image boundary or not) and so on. See for example this paper and references therein.

Also note that with CNN using max-pooling, one can identify positions of feature detectors that contributed to object recognition, and use that to suggest possible object location region.

like image 195
Denis Tarasov Avatar answered Sep 20 '22 05:09

Denis Tarasov