Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is MobileNet V3 faster than V2?

Here's the link to the paper regarding MobileNet V3.

MobileNet V3

According to the paper, h-swish and Squeeze-and-excitation module are implemented in MobileNet V3, but they aim to enhance the accuracy and don't help boost the speed.

h-swish is faster than swish and helps enhance the accuracy, but is much slower than ReLU if I'm not mistaken.

SE also helps enhance the accuracy, but it increases the number of parameters of the network.

Am I missing something? I still have no idea how MobileNet V3 can be faster than V2 with what's said above implemented in V3.

I didn't mention the fact that they also modify the last part of their network as I plan to use MobileNet V3 as the backbone network and combine it with SSD layers for the detection purpose, so the last part of the network won't be used.

The following table, which can be found in the paper mentioned above, shows that V3 is still faster than V2 is.

Object detection results for comparison

like image 212
Jefferson Chiu Avatar asked Jul 09 '19 09:07

Jefferson Chiu


People also ask

How many layers does MobileNet v3 have?

consists of 28 layers, including deep convolution layer, 1 × 1 point convolution layer, batchnorm,ReLU, average collecting layer and softmax. Figure 3 shows the MobileNet architecture.

What is the difference between MobileNet v1 and V2?

How Is It Different From MobileNetV1? The MobileNetV2 models are much faster in comparison to MobileNetV1. It uses 2 times fewer operations, has higher accuracy, needs 30 percent fewer parameters and is about 30-40 percent faster on a Google pixel phone.

What is MobileNet SSD v3?

SSDLite is an object detection model that aims to produce bounding boxes around objects in an image. SSDLite uses MobileNet for feature extraction to enable real-time object detection on mobile devices.

What is MobileNetV3?

MobileNetV3 is a convolutional neural network that is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm, and then subsequently improved through novel architecture advances.


1 Answers

MobileNetV3 is faster and more accurate than MobileNetV2 on classification task, but this is not necessarily true on different task, such as object detection. As you mention yourself, optimizations they did on the deepest end of network are mostly relevant to the classification variant, and as can be seen on the table you referenced, the mAP is no better.

Few things to consider though:

  • It's true SE and h-swish both slow down the network a bit. SE adds some FLOPs and parameters, and h-swish adds complexity, and both causes some latency. However, both are added such that the accuracy-latency trade-off is better, meaning either the latency addition is worth the accuracy gain, or you can maintain the same accuracy while reducing other stuff, thus reducing overall latency. Specifically regarding h-swish, note that they mostly use it in deeper layers, where the tensors are smaller. They are thicker, but due to quadratic drop in resolution (height x width), they are smaller overall, hence h-swish causes less latency.
  • The architecture itself (without h-swish, and even without considering the SE) is searched. Meaning it is better suited to the task than "vanilla" MobileNetV2, since the architecture is "less hand-engineered", and actually optimized to the task. You can see for example, that as in MNASNet, some of the kernels grew to 5x5 (rather than 3x3), not all expansion rates are x6, etc.
  • One change they did to the deepest end of the network is also relevant to object detection. Oddly, while using SSDLite-MobileNetV2, the original authors chose to keep the last 1x1 convolution which expands from depth of 320 to 1280. While this amount of features makes sense for 1000 classes classification, for 80 classes detection it's probably redundant, as the authors of MNv3 say themselves in the middle of page 7 (bottom of first column-top of second).
like image 126
netanel-sam Avatar answered Oct 17 '22 23:10

netanel-sam