I'm working on some stuff related to object detection methods (YOLOv3, Faster-RCNN, RetinaNet, ... ) and I need to train on VOC2007 and VOC2012 (using pretrained models of course). However when I read the relevant papers I do not see people describe if they trained using early stopping or just fixed number of iterations. And if they used early stopping, how many steps were set before stopping ? Because when I tried 100 steps before stopping, it got really poor results . Please help me, thank you very much.
YOLO predicts the coordinates of bounding boxes directly using fully connected layers on top of the convolutional feature extractor. Predicting offsets instead of coordinates simplifies the problem and makes it easier for the network to learn.
I found an implementation of the PASCAL VOC2012 dataset trained for semantic segmentation that uses the following early stopping parameters:
earlyStopping = EarlyStopping(
monitor='val_loss', patience=30, verbose=2, mode='auto')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With