Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Quantization aware training in TensorFlow version 2 and BatchNorm folding

I'm wondering what the current available options are for simulating BatchNorm folding during quantization aware training in Tensorflow 2. Tensorflow 1 has the tf.contrib.quantize.create_training_graph function which inserts FakeQuantization layers into the graph and takes care of simulating batch normalization folding (according to this white paper).

Tensorflow 2 has a tutorial on how to use quantization in their recently adopted tf.keras API, but they don't mention anything about batch normalization. I tried the following simple example with a BatchNorm layer:

import tensorflow_model_optimization as tfmo

model = tf.keras.Sequential([
      l.Conv2D(32, 5, padding='same', activation='relu', input_shape=input_shape),
      l.MaxPooling2D((2, 2), (2, 2), padding='same'),
      l.Conv2D(64, 5, padding='same', activation='relu'),
      l.BatchNormalization(),    # BN!
      l.MaxPooling2D((2, 2), (2, 2), padding='same'),
      l.Dense(1024, activation='relu'),
model = tfmo.quantization.keras.quantize_model(model)

It however gives the following exception:

RuntimeError: Layer batch_normalization:<class 'tensorflow.python.keras.layers.normalization.BatchNormalization'> is not supported. You can quantize this layer by passing a `tfmot.quantization.keras.QuantizeConfig` instance to the `quantize_annotate_layer` API.

which indicates that TF does not know what to do with it.

I also saw this related topic where they apply tf.contrib.quantize.create_training_graph on a keras constructed model. They however don't use BatchNorm layers, so I'm not sure this will work.

So what are the options for using this BatchNorm folding feature in TF2? Can this be done from the keras API, or should I switch back to the TensorFlow 1 API and define a graph the old way?

like image 924
MaartenVds Avatar asked Mar 27 '20 10:03


People also ask

What is quantized aware training?

Quantization aware training emulates inference-time quantization, creating a model that downstream tools will use to produce actually quantized models. The quantized models use lower-precision (e.g. 8-bit instead of 32-bit float), leading to benefits during deployment.

What is quantization aware?

Quantization-aware training helps you train DNNs for lower precision INT8 deployment, without compromising on accuracy. This is achieved by modeling quantization errors during training which helps in maintaining accuracy as compared to FP16 or FP32.

What is quantization TensorFlow?

Post-training quantization includes general techniques to reduce CPU and hardware accelerator latency, processing, power, and model size with little degradation in model accuracy. These techniques can be performed on an already-trained float TensorFlow model and applied during TensorFlow Lite conversion.

What is scale and zero point in quantization?

As the name suggests scale parameter is used to scale back the low-precision values back to the floating-point values. It is stored in full precision for better accuracy. On the other hand, zero-point is a low precision value that represents the quantized value that will represent the real value 0.

1 Answers

If you add BatchNormalization before activation, you would not have issues with Quantization. Note: Quantization is supported in BatchNormalization only if it the layer is exactly after Conv2D layer. https://www.tensorflow.org/model_optimization/guide/quantization/training

# Change
l.Conv2D(64, 5, padding='same', activation='relu'),
l.BatchNormalization(),    # BN!
# with this
l.Conv2D(64, 5, padding='same'),

#Other way of declaring the same
o = (Conv2D(512, (3, 3), padding='valid' , data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = Activation('relu')(o)
like image 98
Mohit Arvind khakharia Avatar answered Sep 21 '22 16:09

Mohit Arvind khakharia