I am building a RCNN detection network using Tensorflow's object detection API.
My goal is to detect bounding boxes for animals in outdoor videos. Most frames do not have animals and are just of dynamic backgrounds.
Most tutorials focus on training custom labels, but make no mention of negative training samples. How do these class of detectors deal with images which do not contain objects of interest? Does it just output a low probability, or will it force to try to draw a bounding box within an image?
My current plan is to use traditional background subtraction in opencv to generate potential frames and pass them to a trained network. Should I also include a class of 'background' bounding boxes as 'negative data'?
The final option would be to use opencv for background subtraction, RCNN to generate bounding boxes, then a classification model of crops to identify animals versus background.
A version for TensorFlow 2.2 can be found here. A version for TensorFlow 1.14 can be found here. This is a step-by-step tutorial/guide to setting up and using TensorFlow’s Object Detection API to perform, namely, object detection in images/video.
Because your dataset already have enough negative examples. This was pointed out by the famous paper Focal Loss for Dense Object Detection. The paper basically proposes to think that each pixel of a dataset image is a training signal.
All region of your images that do not correspond to a bounding box is a "negative sample". Defining explicitly "negative samples" by selecting them in a bounding box will create a new class with name 'none'. You will have 3 classes then. So, to make it simpler focus on your positive examples.
I am currently using object detection on my own dataset, for some of my classes, i have a lot of false positives with high scores (>0.99, so having a higher score threshold won't help).
In general it's not necessary to explicitly include "negative images". What happens in these detection models is that they use the parts of the image that don't belong to the annotated objects as negatives.
If you expect your model to differentiate between "found a figure" and "no figure", then you will almost certainly need to train it on negative examples. Label these as "no image". In the "no image" case, yes, use the entire image as the bounding box; don't suggest that the model recognize anything smaller.
In "no image" cases, you may get a smaller bounding box, but that doesn't matter: in inference, you'll simply ignore whatever box is returned for "no image".
Of course, the critical issue here is to try it out, and see how well it works for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With