I'm looking at the Tensorflow tf.nn.quantized_conv2d function and I'm wondering what exactly the qint8, etc. dataypes are, particularly if they are the datatypes used for the "fake quantization nodes" in tf.contrib.quantize or are actually stored using 8 bits (for qint8) in memory.
I know that they are defined in tf.dtypes.DType, but that doesn't have any information about what they actually are.
These are the data types of the output Tensor
of the function, tf.quantization.quantize()
. This corresponds to the Argument, T
of the function.
Mentioned below is the underlying code, which converts/quantizes a Tensor from one Data Type (e.g. float32
) to another (tf.qint8, tf.quint8, tf.qint32, tf.qint16, tf.quint16
).
out[i] = (in[i] - min_range) * range(T) / (max_range - min_range)
if T == qint8: out[i] -= (range(T) + 1) / 2.0
Then, they can be passed to functions like tf.nn.quantized_conv2d
, etc.., whose input is a Quantized Tensor, explained above.
TLDR, to answer your question in short, they are actually stored 8 bits (for qint8
) in memory.
You can find more information about this topic in the below links:
https://www.tensorflow.org/api_docs/python/tf/quantization/quantize
https://www.tensorflow.org/api_docs/python/tf/nn/quantized_conv2d
https://www.tensorflow.org/lite/performance/post_training_quantization
If you feel this answer is useful, kindly accept this answer and/or up vote it. Thanks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With