I want to optimize my frozen trained Tensorflow model. However, I found out that the optimize_for_inference
library is no longer available.
import tensorflow as tf
from tensorflow.python.tools import freeze_graph
from tensorflow.python.tools import optimize_for_inference_lib
input_graph_def = tf.GraphDef()
with tf.gfile.Open("./inference_graph/frozen_model.pb", "rb") as f:
data = f.read()
input_graph_def.ParseFromString(data)
output_graph_def = optimize_for_inference_lib.optimize_for_inference(
input_graph_def,
["image_tensor"], ## input
["'detection_boxes, detection_scores, detection_classes, num_detections"], ## outputs
tf.float32.as_datatype_enum)
f = tf.gfile.FastGFile("./optimized_model.pb", "wb")
f.write(output_graph_def.SerializeToString())
I found the transform_graph
from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md#strip_unused_nodes to optimize my frozen model. I was able to successfully generate a working optimized model for my object detection model. The purpose of generating an optimized version of the model is to improve inference speed of the model. I entered this code in bash (/tensorflow root directory):
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=/Users/cvsanbuenaventura/Documents/tensorflow_fastlog/models/research/object_detection/inference_graph/frozen_inference_graph.pb \
--out_graph=/Users/cvsanbuenaventura/Documents/tensorflow_fastlog/models/research/object_detection/inference_graph/optimized_inference_graph-transform_graph-manyoutputs-planA2-v2.pb \
--inputs='image_tensor' \
--outputs='detection_boxes, detection_scores, detection_classes, num_detections' \
--transforms='fold_batch_norms
fold_old_batch_norms
fold_constants(ignore_errors=true)'
So my questions are:
fold_batch_norms, fold_old_batch_norms, fold_constants(ignore_errors=true)
strip_unused_nodes(type=float, shape="1,299,299,3")
). What does this do? And what shape should I put here?optimize_for_inference
library not exist anymore? TensorFlow transform is a hybrid of Apache Beam and TensorFlow. It's in between the two. Dataflow preprocessing only works in the context of a pipeline.
tf. Transform is useful for data that requires a full-pass, such as: Normalize an input value by mean and standard deviation. Convert strings to integers by generating a vocabulary over all input values.
I'm a bit looking for the same as you do
About explanations, found this presentation, which details a bit too much; slides 14 and 15 seem to have what you want to know, on SimplifyGraph() https://web.stanford7edu/class/cs245/slides/TFGraphOptimizationsStanford.pdf
This seems that the "1,299,299,3" corresponds to an SSD-300x300 model, So I guess that if there is something related to forcing data to be resized to that. I've read that the idea of optimization is removing nodes required for full training but not for inference. In my case, I'm using a 1920x1080 FRCNN model, so I guess I'll have to remove a "1,1080,1920,3".
Most likely not... would have to check the changelogs of TensorFlow team.
EDIT:
Made my tests finally. It seems that with Faster-RCNN (and possibly R-FCN) I don't get any benefits in inference on GPU with an 'optimized for inference' model (my reference card is a GTX Titan X Maxwell, but I also have an AGX Xavier to test). Tried a 'quantized' model with this instruction:
~/build/tensorflow/tf_1.12.3-cpu/bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph='model.cas.f01-v2_aug_frcnn-1920-1080-dia.pb' --out_graph='opt-for-inf/opt_2q_model.cas.f01-v2_aug_frcnn-1920-1080-dia.pb' --inputs="image_tensor" -- outputs="detection_boxes,detection_scores,detection_classes,num_detections" --transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,1080,1920,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms merge_duplicate_nodes quantize_weights sort_by_execution_order'
And it did not make any better the inference times (let's say, going in the Xavier from 1.2 secs per inference to 0.8 or so). Adding 'quantize_nodes' gave me a mismatch on the layers of the model, which made it infeasible to use. Maybe it works differently for this topology, requiring me to explore more to see how to optimize this model for inference. It seems to work for SSDs, though; will test my own, and publish results.
The thing that I know, is that if for training you have access to a Volta architecture GPU at least (Titan-V, or Tesla V100), or RTX cards, you can use an env var and train on mixed datatypes the model (FP16 when possible, with some in FP32). That makes a better model for inference, if you really don't need the precision. That would depend on the use case: for medical images, highest precision possible. Object detection of vehicles or so, I guess you can compromise precision for speed. Mixed precision training w/nVidia-CUDA: https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#tensorflow-amp
My other approach, would be trying to convert the model to TF-Lite, and see how to use inference there. It's still on my backlog.
I compiled tensorflow with bazel v0.19.x.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With