How does Tensorflow do quantization and dequantization?

Question

According to the blog post "https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/", Tensorflow quantizes values before they go into a layer. After being processed by the layer, the values are dequantized. Tensorflow quantizes values by rescaling the values between 0 and 255, so it needs to keep "min" and "max" to dequantize the values.

I would like to ask: 1. how the "min" and "max" in the outputs of a "quantization" op are determined? I mean, if we simply find the minimum and maximum value and set them to 0 and 255, we will get data overflow or underflow when doing convolution. 2. how the "min" and "max" in the outputs of a "convolution" op are determined? Both weights and activations are quantized, so there are two sets of "min" and "max". How does a convolution op combine them to form a single set of "min" and "max"?

gizzmole · Accepted Answer

TensorFlow uses i.a. gemmlowp for low-precision matrix multiplications. Although 8-bit values are used as inputs, intermediate results are 32-bit values. These 32-bit values are converted back to 8-bit before returning the results.

From https://github.com/google/gemmlowp/blob/master/doc/low-precision.md :

To avoid overflow, we internally accumulate results on more than 8 bits, and at the end we keep only some significant 8 bits.

How does Tensorflow do quantization and dequantization?

Tags:

tensorflow

denru

1 Answers

gizzmole

Recent Activity

Donate For Us

How does Tensorflow do quantization and dequantization?

Tags:

tensorflow

denru

1 Answers

gizzmole

Related questions

Recent Activity

Donate For Us