What does R-CNN actually do? Is it like using features extracted by CNN to detect classes in a specified window area? Is there any tensorflow implementation for this?
Object detection is the process of finding and classifying objects in an image. One deep learning approach, regions with convolutional neural networks (R-CNN), combines rectangular region proposals with convolutional neural network features. R-CNN is a two-stage detection algorithm.
Although R-CNN is good at detecting objects, it has its shortcomings. This algorithm is slow and it takes about 47 secs to perform object detection on an image. Training is not done in a single step. There are different models for doing different parts which make the training process time-consuming.
The first stage of the R-CNN pipeline is the generation of 'region proposals' or regions in an image that could belong to a particular object. The authors use the selective search algorithm .
This is the basic difference between the Fast R-CNN and Faster R-CNN. Faster R-CNN uses a region proposal method to create the sets of regions. Faster R-CNN possesses an extra CNN for gaining the regional proposal, which we call the regional proposal network.
R-CNN is using the following algorithm:
There are more advanced algorithms that are built upon this like fast-R-CNN and faster R-CNN.
fast-R-CNN:
faster R-CNN:
There are a lot of implantation in tensorflow specifically for faster R-CNN which is the most recent variant just google faster R-CNN tensorflow.
Good luck
R-CNN is the daddy-algorithm for all the mentioned algos, it really provided the path for researchers to build more complex and better algorithm on top of it. I am trying to explain R-CNN and the other variants of it.
R-CNN consist of 3 simple steps:
Fast R-CNN was immediately followed R-CNN. Fast R-CNN is faster and better by the virtue of following points:
Intuitively it makes a lot of sense to remove 2000 conv layers and instead take once Convolution and make boxes on top of that.
One of the drawbacks of Fast R-CNN was the slow selective search algorithm and Faster R-CNN introduced something called Region Proposal network(RPN).
Here’s is the working of the RPN:
At the last layer of an initial CNN, a 3x3 sliding window moves across the feature map and maps it to a lower dimension (e.g. 256-d) For each sliding-window location, it generates multiple possible regions based on k fixed-ratio anchor boxes (default bounding boxes)
Each region proposal consists of:
For each of those boxes, we output whether or not we think it contains an object, and what the coordinates for that box are. This is what it looks like at one sliding window location:
The 2k scores represent the softmax probability of each of the k bounding boxes being on “object.” Notice that although the RPN outputs bounding box coordinates, it does not try to classify any potential objects: its sole job is still proposing object regions. If an anchor box has an “objectness” score above a certain threshold, that box’s coordinates get passed forward as a region proposal.
Once we have our region proposals, we feed them straight into what is essentially a Fast R-CNN. We add a pooling layer, some fully-connected layers, and finally a softmax classification layer and bounding box regressor. In a sense, Faster R-CNN = RPN + Fast R-CNN.
Linking some Tensorflow implementation:
https://github.com/smallcorgi/Faster-RCNN_TF
https://github.com/CharlesShang/FastMaskRCNN
You can find a lot of implementation of Github.
P.S. I borrowed a lot of material from Joyce Xu Medium blog.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With