This question has maybe been answered but I didn't find a simple answer to this. I created a convnet using Keras to classify The Simpsons characters (dataset here).
I have 20 classes and giving an image as input, I return the character name. It's pretty simple. My dataset contains pictures with the main character in the picture and only have the name of the character as a label.
Now I would like to add an object detection ask i.e draw a bounding box around characters in the picture and predict which character it is. I don't want to use a sliding window because it's really slow. So I thought about using faster RCNN (github repo) or YOLO (github repo). Should I have to add the coordinates of the bounding box for each picture of my training set? Is there a way to do object detection (and get bounding boxes in my test) without giving the coordinates for the training set?
In sum, I would like to create a simple object detection model, I don't know if it's possible to create a simpler YOLO or Faster RCNN.
Thank you very much for any help.
The goal of yolo or faster rcnn is to get the bounding boxes. So in short, yes you will need to label the data to train it.
Take a shortcut:
You may already have a suitable architecture in mind already: "Now I would like to add an object detection ask i.e draw a bounding box around characters in the picture and predict which character it is."
So you just split the task in two parts:
1. Add an object detector for person detection to return bounding boxes
2. Classify bounding boxes using the convnet you already trained
For part 1 you should be good to go by using a feature detector (for example a convnet pretrained on COCO or Imagenet) with an object detector (still YOLO and Faster-RCNN) on top to detect people. However, you may find that people in "cartoons" (let's say Simpsons are people) are not properly recognized because the feature detector is not trained on cartoon-based images but on real images. In that case, you could try to re-train a few layers of the feature detector on cartoon pictures in order to learn cartoon features, according to the transfer learning methodology.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With