How do you convert a Tensorflow graph from using float32
to float16
? Currently there are graph optimizations for quantization and conversion to eight bit ints.
Trying to load float32
weights into a float16
graph fails with:
DataLossError (see above for traceback): Invalid size in bundle entry: key model/conv5_1/biases; stored size 1536; expected size 768
[[Node: save/RestoreV2_16 = RestoreV2[dtypes=[DT_HALF], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_16/tensor_names, save/RestoreV2_16/shape_and_slices)]]
[[Node: save/RestoreV2_3/_39 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_107_save/RestoreV2_3", tensor_type=DT_HALF, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
TfLite models are also quantized, due to which they are not as accurate as the original models. To solve this issue, quantization aware training can be used. It converts the weights to int-8 while training before converting it back to 32-bit float, so it acts like noise for the models forcing them to learn accordingly.
Quantization works by reducing the precision of the numbers used to represent a model's parameters, which by default are 32-bit floating point numbers. This results in a smaller model size and faster computation.
Post-training float16 quantization reduces TensorFlow Lite model sizes (up to 50%), while sacrificing very little accuracy. It quantizes model constants (like weights and bias values) from full precision floating point (32-bit) to a reduced precision floating point data type (IEEE FP16).
Dynamic range quantization To further reduce latency during inference, "dynamic-range" operators dynamically quantize activations based on their range to 8-bits and perform computations with 8-bit weights and activations. This optimization provides latencies close to fully fixed-point inferences.
I think my solution is definitely not the best and not the one which is the most straight forward, but as nobody else posted anything:
What I did was training the network with full precision and saved them in a checkpoint. Then I built a copy of the network setting all variables desired to a dtype of tf.float16 and removing all the training nodes. Finally, I loaded and casted the variables the following way:
previous_variables = [
var_name for var_name, _
in tf.contrib.framework.list_variables('path-to-checkpoint-file')]
#print(previous_variables)
sess.run(tf.global_variables_initializer())
restore_map = {}
for variable in tf.global_variables():
if variable.op.name in previous_variables:
var = tf.contrib.framework.load_variable(
'path-to-checkpoint-file', variable.op.name)
if(var.dtype == np.float32):
tf.add_to_collection('assignOps', variable.assign(
tf.cast(var, tf.float16)))
else:
tf.add_to_collection('assignOps', variable.assign(var))
sess.run(tf.get_collection('assignOps'))
This obviously has issues if there are tensors of float32 that you don't want to convert, which I luckily don't have as I want to convert all my nodes to float16 precision. In case you have those you could further filter with other if statements. I hope this answers your question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With