I am trying to infer tinyYOLO-V2
with INT8
weights and activation. I can convert the weights to INT8 with TFliteConverter. For INT8
activation, I have to give representative dataset to estimate the scaling factor. My method of creating such dataset seems wrong.
What is the correct procedure ?
def rep_data_gen():
a = []
for i in range(160):
inst = anns[i]
file_name = inst['filename']
img = cv2.imread(img_dir + file_name)
img = cv2.resize(img, (NORM_H, NORM_W))
img = img / 255.0
img = img.astype('float32')
a.append(img)
a = np.array(a)
print(a.shape) # a is np array of 160 3D images
img = tf.data.Dataset.from_tensor_slices(a).batch(1)
for i in img.take(BATCH_SIZE):
print(i)
yield [i]
# https://www.tensorflow.org/lite/performance/post_training_quantization
converter = tf.lite.TFLiteConverter.from_keras_model_file("./yolo.h5")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = [tf.int8]
converter.inference_output_type = [tf.int8]
converter.representative_dataset=rep_data_gen
tflite_quant_model = converter.convert()
ValueError: Cannot set tensor: Got tensor of type STRING but expected type FLOAT32 for input 27, name: input_1
RepresentativeDataset( input_gen. ) This is a generator function that provides a small dataset to calibrate or estimate the range, i.e, (min, max) of all floating-point arrays in the model (such as model input, activation outputs of intermediate layers, and model output) for quantization.
Overview. Integer quantization is an optimization strategy that converts 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This results in a smaller model and increased inferencing speed, which is valuable for low-power devices such as microcontrollers.
The TensorFlow Lite converter takes a TensorFlow model and generates a TensorFlow Lite model (an optimized FlatBuffer format identified by the . tflite file extension). You can load a SavedModel or directly convert a model you create in code. You can convert your model using the Python API or the Command line tool.
Dynamic range quantization To further improve latency, "dynamic-range" operators dynamically quantize activations based on their range to 8-bits and perform computations with 8-bit weights and activations. This optimization provides latencies close to fully fixed-point inference.
I used your code for reading in a dataset and found the error:
img = img.astype('float32') should be
img = img.astype(np.float32)
Hope this helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With