I am trying to convert a trained model from checkpoint file to tflite
. I am using tf.lite.LiteConverter
. The float conversion went fine with reasonable inference speed. But the inference speed of the INT8
conversion is very slow. I tried to debug by feeding in a very small network. I found that inference speed for INT8 model is generally slower than float model.
In the INT8 tflite file, I found some tensors called ReadVariableOp, which doesn't exist in TensorFlow's official mobilenet tflite model.
I wonder what causes the slowness of INT8 inference.
TfLite models are also quantized, due to which they are not as accurate as the original models. To solve this issue, quantization aware training can be used. It converts the weights to int-8 while training before converting it back to 32-bit float, so it acts like noise for the models forcing them to learn accordingly.
Utilization rate hits 270% with the . ckpt file, but stays at around 100% with the . tflite file. One hypothesis is that tensorflow lite is not configured for multithreading, and another is that tensorflow lite is optimized for ARM processors (rather than an Intel one that my computer runs on) and thus it is slower.
To convert the frozen graph to Tensorflow Lite, we need to run it through the Tensorflow Lite Converter. It converts the model into an optimized FlatBuffer format that runs efficiently on Tensorflow Lite. If things ran successfully, you should now see a third file in the /tmp/tflite directory called detect. tflite .
You possibly used x86 cpu instead of one with arm instructions. You can refer it here https://github.com/tensorflow/tensorflow/issues/21698#issuecomment-414764709
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With