Tensorflow Object Detection API with Mobilenets overfits custom multiclass dataset

Question

The model overfits the training set and fails to generalise to test set.

How could I add dropout to the feature extractor part of the model? (the .config file only provides a key value to add dropout to the box predictor)
What other measures I could take to minimize overfitting?

More details below:

I am attempting to retrain the model checkpoint "ssd_mobilenet_v1_coco_11_06_2017" on a dataset of toy animals. There are 14 classes, with 400-600 images in each. The network learns the training set in less than 30k steps. Tensorboard. The loss seems to remain quite volatile still after initial training, although I am not experienced enough to assess that.

I am testing the model by applying the exported graph to images and inspecting the results manually. (I just haven't had time to implement validation properly). The model works very well on pictures taken in very similar conditions to those in the training set. These bad test set images are randomly set aside from the training set which was acquired by taking many images in succession with slight changes in camera angle. The training set also includes various lighting conditions, backgrounds, distortions and camera locations. I would estimate it gets the classes and locations right in roughly 95% of the images from the bad test set. From this I conclude that the model fits the training set very well and can generalise a little bit.

However, the model performs very poorly on pictures taken separately with a different camera at a different time (i.e. there should be much less correlation between this test set and the training set). I would estimate the performance in this good test set is roughly 25%. From this I conclude that the model is overfitting and failing to generalise.

I have tried making some changes in the .config file.

Increasing the l2_regularizer weight from 0.00004 to 0.0001 for both the feature extractor and box predictor.
Setting box predictor use_dropout to true in order to enable the 20% dropout.

I am using Tensorflow 1.4 pip install and models cloned from github from about 3 weeks ago.

I call the train.py in object_detection with the following arguments:

python train.py --logtostderr --train_dir=/home/X/TrainDir/Process --pipeline_config_path=/home/X/ssd_mobilenet_v1_coco.config

My configuration file is the following:

# SSD with Mobilenet v1 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 14
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: true
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.0001
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.0001
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
          anchorwise_output: true
        }
      }
      localization_loss {
        weighted_smooth_l1 {
          anchorwise_output: true
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 8
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/home/X/tensorflow/models/research/object_detection/ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/home/X/TrainDir/train.record"
  }
  label_map_path: "/home/X/TrainDir/data_label_map.pbtxt"
}

eval_config: {
  num_examples: 1200
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 30
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/home/X/TrainDir/test.record"
  }
  label_map_path: "/home/X/TrainDir/data_label_map.pbtxt"
  shuffle: false
  num_readers: 1
  num_epochs: 1
}

AlexU · Accepted Answer

After some tricks the network learned well and started to generalise on the good test set.

I included learning rate decay of about 10% every 5000 steps (this already helped quite a bit).
I added to the training set of toy animals about 10% extra images of images of real animals of the same kind. This improved generalisation drastically.
Training for longer improved results further.
I left the regularisation and box_predictor dropout to their original values.

The trained network performed well in a real world scenario, doing online detection of these animals while the pictures were taken in a completely new scene and lighting condition.

The following added on 06/03/2020

In response to the request in the comments, I dug out the configuration file I had stored with this project (> 2 years ago). It is most likely the final configuration I ended up using, which worked well.

# SSD with Mobilenet v1 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 14
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: true
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
          anchorwise_output: true
        }
      }
      localization_loss {
        weighted_smooth_l1 {
          anchorwise_output: true
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 8
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 7000
          decay_factor: 0.75
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/home/sander/tensorflow/models/research/object_detection/ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/home/sander/ROBOT/TrainDir/train.record"
  }
  label_map_path: "/home/sander/ROBOT/TrainDir/data_label_map.pbtxt"
}

eval_config: {
  num_examples: 1200
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 30
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/home/sander/ROBOT/TrainDir/test.record"
  }
  label_map_path: "/home/sander/ROBOT/TrainDir/data_label_map.pbtxt"
  shuffle: false
  num_readers: 1
  num_epochs: 1
}

giobatta912 · Answer

Have you tried to set up a validation set and automatically run the evaluation in parallel to the training using the partially trained models? If the validation accuracy is not converging yet, then probably you just need to train your model longer.

Other things to look at: Are there any other differences between the "good" and "bad" test sets? E.g. resolution/aspect ratio. Are you correctly following all the pre-processing steps exactly the same way you are performing those during training? E.g. data standardization, resizing with the same algorithm, etc...

EDIT: I checked your tensorboard screenshot, why do you think the net learned your training set? It does not seem the loss really converged. One other thing you should definitely do is to set up a scheduler to reduce your learning dividing by 10 say every 40K steps, it is possible after learning some features your gradient descent has trouble converging as you never change the learning rate from the starting value and it might be too big for that point in time in the training.

Tensorflow Object Detection API with Mobilenets overfits custom multiclass dataset

Tags:

python

machine-learning

tensorflow

object-detection

AlexU

2 Answers

AlexU

giobatta912

Recent Activity

Donate For Us

Tensorflow Object Detection API with Mobilenets overfits custom multiclass dataset

Tags:

python

machine-learning

tensorflow

object-detection

AlexU

2 Answers

AlexU

giobatta912

Related questions

Recent Activity

Donate For Us