Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the different between SSD and SSD Lite ??(Tensorflow)

I've read the paper MobileNetV2(arXiv:1801.04381)

and ran the model from Tensorflow model zoo.

I noticed that the inference time of SSD Lite MobileNetV2 is faster than SSD MobileNetV2.

At the MobileNetV2 paper, there is only a short explanation about SSD Lite in the following sentence:

'We replace all the regular convolutions with separable convolutions (depthwise followed by 1 × 1 projection) in SSD prediction layers'.

So my question is, what is the difference between SSD and SSD Lite?

I don't understand the difference because when the MobileNetV1(arXiv:1704.04861v1) was published and applied to SSD, it has already replaced all the convolutional layers to depthwise separable convolutions that were mentioned above.

like image 672
Seongkyun Han Avatar asked Jun 04 '18 06:06

Seongkyun Han


People also ask

What is SSDlite?

The SSDlite is an adaptation of SSD which was first briefly introduced on the MobileNetV2 paper and later reused on the MobileNetV3 paper. Because the main focus of the two papers was to introduce novel CNN architectures, most of the implementation details of SSDlite were not clarified.

What is mobilenet SSD?

The mobilenet-ssd model is a Single-Shot multibox Detection (SSD) network intended to perform object detection. This model is implemented using the Caffe* framework.


2 Answers

It is frustrating since all searches for SSDLite result in "a novel framework we call SSDLite" so I was expecting a thing. However, I suspect that SSDLite is simply implemented by one modification (kernel_size) and two additions (use_depthwise) to the common SSD model file.

Comparing the model files ssd_mobilenet_v1_coco.config and sdlite_mobilenet_v2_coco.config produces the following:

model {
  ssd {
    box_predictor {
      convolutional_box_predictor {
        kernel_size: 3
        use_depthwise: true
      }
    }
    feature_extractor {
      use_depthwise: true
    }
  }
}

I'll have to try it out.

like image 64
Alaric Dobson Avatar answered Nov 14 '22 21:11

Alaric Dobson


As one of the answer already pointed out, the main differences in configs were the two use_depthwise options for both box_predictor and feature_extractor. The underlying changes had already been implemented in the codebase which essentially replace all regular convolutions in SSD layers and the last box+class prediction layer to depthwise + pointwise separable convolutions. The theoretical parameter and flops saving were described in our MobilenetV2 paper.

Also to answer the question of @Seongkyun Han, we did not replace all the conv in SSD layers in our v1 paper (only all the layers that belong to mobilenet were separable conv).

like image 30
Menglong Zhu Avatar answered Nov 14 '22 22:11

Menglong Zhu