I'm attempting to train a faster-rccn model for small digit detection. I'm using the newly released tensorflow object detection API and so far have been fine tuning a pre-trained faster_rcnn_resnet101_coco from the zoo. All my training attempts have resulted in models with high precision but low recall. Out of the ~120 objects (digits) on each image only ~20 objects are ever detected, but when detected the classification is accurate. (Also, I am able to train a simple convnet from scratch on my cropped images with high accuracy so the problem is in the detection aspect of the model.) Each digit is on average 60x30 in the original images (and probably about half that size after the image is resized before being fed into the model.) Here is an example image with detected boxes of what I'm seeing:
What is odd to me is how it is able to correctly detect neighboring digits but completely miss the rest that are very similar in terms of pixel dimensions.
I have tried adjusting the hyperparameters around anchor box generation and first_stage_max_proposals
but nothing has improved the results so far. Here is an example config file I have used. What other hyperparameters should I try adjusting? Any other suggestions on how to diagnose the problem? Should I be looking into other architectures or does my task look doable with faster-rccn and/or SSD?
Faster RCNN is used as a two-stage deep learning model for detecting these small objects; however, this model has some limitations in detecting small objects.
In the end the immediate problem was that I was not using the visualizer correctly. By updating the parameters for visualize_boxes_and_labels_on_image_array
as described by Johnathan in the comments I was able to see that that I am at least detecting more boxes than I had thought.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With