What does R-CNN actually do? Is it like using features extracted by CNN to detect classes in a specified window area? Is there any tensorflow implementation for this?

R-CNN is using the following algorithm: <ol> <li>Get region proposals for object detection (using selective search).</li> <li>For each region crop the area from the image and run it thorough a CNN which classify the object.</li> </ol> There are more advanced algorithms that are built upon this like fast-R-CNN and faster R-CNN. fast-R-CNN: <ol> <li>Run the entire image through the CNN </li> <li>For each region from the region proposals extract the area using "roi polling" layer and than classify the object.</li> </ol> faster R-CNN: <ol> <li>Run the entire image through the CNN </li> <li>Using the features detected using the CNN find region proposals using a object proposals network.</li> <li>For each object proposal extract the area using "roi polling" layer and than classify the object.</li> </ol> There are a lot of implantation in tensorflow specifically for faster R-CNN which is the most recent variant just google faster R-CNN tensorflow. Good luck

Object detection with R-CNN?

2 Answers

R-CNN is using the following algorithm:

Get region proposals for object detection (using selective search).
For each region crop the area from the image and run it thorough a CNN which classify the object.

There are more advanced algorithms that are built upon this like fast-R-CNN and faster R-CNN.

fast-R-CNN:

Run the entire image through the CNN
For each region from the region proposals extract the area using "roi polling" layer and than classify the object.

faster R-CNN:

Run the entire image through the CNN
Using the features detected using the CNN find region proposals using a object proposals network.
For each object proposal extract the area using "roi polling" layer and than classify the object.

There are a lot of implantation in tensorflow specifically for faster R-CNN which is the most recent variant just google faster R-CNN tensorflow.

Good luck

127

answered Sep 21 '22 17:09

Amitay Nachmani

R-CNN is the daddy-algorithm for all the mentioned algos, it really provided the path for researchers to build more complex and better algorithm on top of it. I am trying to explain R-CNN and the other variants of it.

R-CNN, or Region-based Convolutional Neural Network

R-CNN consist of 3 simple steps:

Scan the input image for possible objects using an algorithm called Selective Search, generating ~2000 region proposals
Run a convolutional neural net (CNN) on top of each of these region proposals
Take the output of each CNN and feed it into a) an SVM to classify the region and b) a linear regressor to tighten the bounding box of the object, if such an object exists.

A pictorial description of R-CNN

Fast R-CNN:

Fast R-CNN was immediately followed R-CNN. Fast R-CNN is faster and better by the virtue of following points:

Performing feature extraction over the image before proposing regions, thus only running one CNN over the entire image instead of 2000 CNN’s over 2000 overlapping regions
Replacing the SVM with a softmax layer, thus extending the neural network for predictions instead of creating a new model.

A pictorial description of Fast R-CNN

Intuitively it makes a lot of sense to remove 2000 conv layers and instead take once Convolution and make boxes on top of that.

Faster R-CNN:

One of the drawbacks of Fast R-CNN was the slow selective search algorithm and Faster R-CNN introduced something called Region Proposal network(RPN).

Here’s is the working of the RPN:

At the last layer of an initial CNN, a 3x3 sliding window moves across the feature map and maps it to a lower dimension (e.g. 256-d) For each sliding-window location, it generates multiple possible regions based on k fixed-ratio anchor boxes (default bounding boxes)

Each region proposal consists of:

An “objectness” score for that region and
4 coordinates representing the bounding box of the region In other words, we look at each location in our last feature map and consider k different boxes centered around it: a tall box, a wide box, a large box, etc.

For each of those boxes, we output whether or not we think it contains an object, and what the coordinates for that box are. This is what it looks like at one sliding window location:

Region Proposal Network

The 2k scores represent the softmax probability of each of the k bounding boxes being on “object.” Notice that although the RPN outputs bounding box coordinates, it does not try to classify any potential objects: its sole job is still proposing object regions. If an anchor box has an “objectness” score above a certain threshold, that box’s coordinates get passed forward as a region proposal.

Once we have our region proposals, we feed them straight into what is essentially a Fast R-CNN. We add a pooling layer, some fully-connected layers, and finally a softmax classification layer and bounding box regressor. In a sense, Faster R-CNN = RPN + Fast R-CNN.

Faster R-CNN

Linking some Tensorflow implementation:

https://github.com/smallcorgi/Faster-RCNN_TF

https://github.com/CharlesShang/FastMaskRCNN

You can find a lot of implementation of Github.

P.S. I borrowed a lot of material from Joyce Xu Medium blog.

answered Sep 19 '22 17:09

Abhishek Kumar

Related questions
                            
                                How to ensure tensorflow is using the GPU
                            
                                tf.keras.models.save_model and optimizer warning
                            
                                unable to build model as backend.squeeze has no layer
                            
                                tensorflow error This file requires compiler and library support for the ISO C++ 2011 standard
                            
                                How convert output tensor to one-hot tensor?
                            
                                How to train an lstm for speech recognition
                            
                                Tensorflow: open a PIL.Image?
                            
                                Why is there no mention of contrib.layers.linear in the Tensorflow documentation?
                            
                                Tensorflow GPU - Spyder
                            
                                Do we need to use beam search in training process?
                            
                                How do I use tensor board with tf.layers?
                            
                                Shuffling tfrecords files
                            
                                ImportError: DLL load failed: A dynamic link library (DLL) initialization routine failed
                            
                                pandas.DataFrame.describe() gives no output in .py script
                            
                                How to use the Embedding Projector in Tensorflow 2.0
                            
                                Can I write a keras callback that records and returns the total training time?
                            
                                What does the .numpy() function do?
                            
                                Installing tensorflow on Linux
                            
                                Run multiple pre-trained Tensorflow nets at the same time
                            
                                What does embedding do in tensorflow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Object detection with R-CNN?

Tags:

tensorflow

deep-learning

computer-vision

face-detection

Shamane Siriwardhana

People also ask