tflite quantized inference very slow

Tags:

I am trying to convert a trained model from checkpoint file to tflite. I am using tf.lite.LiteConverter. The float conversion went fine with reasonable inference speed. But the inference speed of the INT8 conversion is very slow. I tried to debug by feeding in a very small network. I found that inference speed for INT8 model is generally slower than float model.

In the INT8 tflite file, I found some tensors called ReadVariableOp, which doesn't exist in TensorFlow's official mobilenet tflite model.

I wonder what causes the slowness of INT8 inference.

349

asked Oct 11 '19 23:10

wuhy08

1 Answers

You possibly used x86 cpu instead of one with arm instructions. You can refer it here https://github.com/tensorflow/tensorflow/issues/21698#issuecomment-414764709

answered Sep 20 '22 07:09

Charlie Qiu

Related questions
                            
                                How to Fine-tuning a Pretrained Network in Tensorflow?
                            
                                Read big train/validation/test datasets in tensorflow
                            
                                How to assign values to a subset of a tensor in tensorflow?
                            
                                How to save and restore partitioned variable in Tensorflow
                            
                                Distributed Tensorflow: good example for synchronous training on CPUs
                            
                                Retraining the last layer of Inception-ResNet-v2
                            
                                Running TensorFlow on multicore devices
                            
                                Resource Exhausted OOM while loading VGG16
                            
                                Can't use estimator + dataset and train for less than one epoch
                            
                                Use TensorFlow python code with android app
                            
                                tensorflow model.evaluate and model.predict very different results
                            
                                Which seeds have to be set where to realize 100% reproducibility of training results in tensorflow?
                            
                                How to simulate reduced precision floats in TensorFlow?
                            
                                `Optimal` variable initialization and learning rate in Tensorflow for matrix factorization
                            
                                Use Tensorflow and pre-trained FastText to get embeddings of unseen words
                            
                                Get length of a dataset in Tensorflow
                            
                                how to serve pytorch or sklearn models using tensorflow serving
                            
                                Keras: what does class_weight actually try to balance?
                            
                                Can multiple tensorflow inferences run on one GPU in parallel?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

tflite quantized inference very slow

Tags:

tensorflow

tensorflow-lite

quantization

wuhy08

People also ask

1 Answers

Charlie Qiu

Recent Activity

Donate For Us