Refering to tensorflow mobilenetv1 model: https://github.com/tensorflow/models/blob/9f7a5fa353df0ee2010f8e7a5494ca6b188af8bc/research/slim/nets/mobilenet_v1.py#L171
The param depth_multiplier is documented as:
depth_multiplier: Float multiplier for the depth (number of channels) for all convolution ops. The value must be greater than zero. Typical usage will be to set this value in (0, 1) to reduce the number of parameters or computation cost of the model
But in the (paper), they mention 2 types of multipliers: width multiplier and resolution multiplier, so which one correspond to depth multiplier?
On Keras, they say that:
depth_multiplier: depth multiplier for depthwise convolution (also called the resolution multiplier)
I'm so confused!
MobileNetV2 is a very effective feature extractor for object detection and segmentation. For example, for detection when paired with the newly introduced SSDLite [2] the new model is about 35% faster with the same accuracy than MobileNetV1. We have open sourced the model under the Tensorflow Object Detection API [4].
Width Multiplier α is introduced to control the number of channels or channel depth, which makes M become αM. And the depthwise separable convolution cost become: Depthwise Separable Convolution Cost with Width Multiplier α where α is between 0 to 1, with typical settings of 1, 0.75, 0.5 and 0.25.
MobileNetV2 is very similar to the original MobileNet, except that it uses inverted residual blocks with bottlenecking features. It has a drastically lower parameter count than the original MobileNet. MobileNets support any input size greater than 32 x 32, with larger image sizes offering better performance.
The MobileNet model has only 13 million parameters with the usual 3 million for the body and 10 million for the final layer and 0.58 Million mult-adds.
As described in the paper:
The role of the width multiplier α is to thin a network uniformly at each layer. for a given layer and width multiplier α, the number of input channels M becomes αM and the number of output channels N becomes αN.
The resolution multiplier ρ is applied to the input image and the internal representation of every layer is subsequently reduced by the same multiplier. In practice we implicitly set ρ by setting the input resolution.
In the code: The depth_multiplier is used to reduce the number of channels at each layer. So the depth_multiplier corresponds the width multiplier α.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With