These days I am trying to track down an error concerning the deployment of a TF model with TPU support.
I can get a model without TPU support running, but as soon as I enable quantization, I get lost.
I am in the following situation:
For the last point, I used the TFLiteConverter's Python API. The script that produces a functional tflite model is
import tensorflow as tf graph_def_file = 'frozen_model.pb' inputs = ['dense_input'] outputs = ['dense/BiasAdd'] converter = tf.lite.TFLiteConverter.from_frozen_graph(graph_def_file, inputs, outputs) converter.inference_type = tf.lite.constants.FLOAT input_arrays = converter.get_input_arrays() converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE] tflite_model = converter.convert() open('model.tflite', 'wb').write(tflite_model)
This tells me that my approach seems to be ok up to this point. Now, if I want to utilize the Coral TPU stick, I have to quantize my model (I took that into account during training). All I have to do is to modify my converter script. I figured that I have to change it to
import tensorflow as tf graph_def_file = 'frozen_model.pb' inputs = ['dense_input'] outputs = ['dense/BiasAdd'] converter = tf.lite.TFLiteConverter.from_frozen_graph(graph_def_file, inputs, outputs) converter.inference_type = tf.lite.constants.QUANTIZED_UINT8 ## Indicates TPU compatibility input_arrays = converter.get_input_arrays() converter.quantized_input_stats = {input_arrays[0]: (0., 1.)} ## mean, std_dev converter.default_ranges_stats = (-128, 127) ## min, max values for quantization (?) converter.allow_custom_ops = True ## not sure if this is needed ## REMOVED THE OPTIMIZATIONS ALTOGETHER TO MAKE IT WORK tflite_model = converter.convert() open('model.tflite', 'wb').write(tflite_model)
This tflite model produces results when loaded with the Python API of the interpreter, but I am not able to understand their meaning. Also, there is no (or if there is, it is hidden well) documentation on how to choose mean, std_dev and the min/max ranges. Also, after compiling this with the edgetpu_compiler and deploying it (loading it with the C++ API), I receive an error:
INFO: Initialized TensorFlow Lite runtime. ERROR: Failed to prepare for TPU. generic::failed_precondition: Custom op already assigned to a different TPU. ERROR: Node number 0 (edgetpu-custom-op) failed to prepare. Segmentation fault
I suppose I missed a flag or something during the conversion process. But as the documentation is also lacking here, I can't say for sure.
In short:
I am grateful for any help or guidance!
EDIT: I have opened a github issue with the full test code. Feel free to play around with this.
The TensorFlow Lite converter takes a TensorFlow model and generates a TensorFlow Lite model (an optimized FlatBuffer format identified by the . tflite file extension). You can load a SavedModel or directly convert a model you create in code. You can convert your model using the Python API or the Command line tool.
Overview. Integer quantization is an optimization strategy that converts 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This results in a smaller model and increased inferencing speed, which is valuable for low-power devices such as microcontrollers.
Post-training quantization includes general techniques to reduce CPU and hardware accelerator latency, processing, power, and model size with little degradation in model accuracy. These techniques can be performed on an already-trained float TensorFlow model and applied during TensorFlow Lite conversion.
Quantization aware training emulates inference-time quantization, creating a model that downstream tools will use to produce actually quantized models. The quantized models use lower-precision (e.g. 8-bit instead of 32-bit float), leading to benefits during deployment.
You should never need to manually set the quantization stats.
Have you tried the post-training-quantization tutorials?
https://www.tensorflow.org/lite/performance/post_training_integer_quant
Basically they set the quantization options:
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] converter.inference_input_type = tf.uint8 converter.inference_output_type = tf.uint8
Then they pass a "representative dataset" to the converter, so that the converter can run the model a few batches to gather the necessary statistics:
def representative_data_gen(): for input_value in mnist_ds.take(100): yield [input_value] converter.representative_dataset = representative_data_gen
While there are options for quantized training, it's always easier to to do post-training quantization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With