How to use `transform_graph` in Tensorflow

Tags:

I want to optimize my frozen trained Tensorflow model. However, I found out that the optimize_for_inference library is no longer available.

import tensorflow as tf

from tensorflow.python.tools import freeze_graph
from tensorflow.python.tools import optimize_for_inference_lib

input_graph_def = tf.GraphDef()
with tf.gfile.Open("./inference_graph/frozen_model.pb", "rb") as f:
    data = f.read()
    input_graph_def.ParseFromString(data)

output_graph_def = optimize_for_inference_lib.optimize_for_inference(
        input_graph_def,
        ["image_tensor"], ## input  
        ["'detection_boxes, detection_scores, detection_classes, num_detections"], ## outputs
        tf.float32.as_datatype_enum)

f = tf.gfile.FastGFile("./optimized_model.pb", "wb")
f.write(output_graph_def.SerializeToString())

I found the transform_graph from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md#strip_unused_nodes to optimize my frozen model. I was able to successfully generate a working optimized model for my object detection model. The purpose of generating an optimized version of the model is to improve inference speed of the model. I entered this code in bash (/tensorflow root directory):

bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=/Users/cvsanbuenaventura/Documents/tensorflow_fastlog/models/research/object_detection/inference_graph/frozen_inference_graph.pb \
--out_graph=/Users/cvsanbuenaventura/Documents/tensorflow_fastlog/models/research/object_detection/inference_graph/optimized_inference_graph-transform_graph-manyoutputs-planA2-v2.pb \
--inputs='image_tensor' \
--outputs='detection_boxes, detection_scores, detection_classes, num_detections' \
--transforms='fold_batch_norms
fold_old_batch_norms
fold_constants(ignore_errors=true)'

So my questions are:

What do the transforms do? fold_batch_norms, fold_old_batch_norms, fold_constants(ignore_errors=true)
I was able to successfully generate an optimized model using the three transforms above. But there are other transforms (e.g. strip_unused_nodes(type=float, shape="1,299,299,3")). What does this do? And what shape should I put here?
Does the optimize_for_inference library not exist anymore?

866

asked Sep 17 '18 05:09

Chaine

1 Answers

I'm a bit looking for the same as you do

About explanations, found this presentation, which details a bit too much; slides 14 and 15 seem to have what you want to know, on SimplifyGraph() https://web.stanford7edu/class/cs245/slides/TFGraphOptimizationsStanford.pdf
This seems that the "1,299,299,3" corresponds to an SSD-300x300 model, So I guess that if there is something related to forcing data to be resized to that. I've read that the idea of optimization is removing nodes required for full training but not for inference. In my case, I'm using a 1920x1080 FRCNN model, so I guess I'll have to remove a "1,1080,1920,3".
Most likely not... would have to check the changelogs of TensorFlow team.

EDIT:

Made my tests finally. It seems that with Faster-RCNN (and possibly R-FCN) I don't get any benefits in inference on GPU with an 'optimized for inference' model (my reference card is a GTX Titan X Maxwell, but I also have an AGX Xavier to test). Tried a 'quantized' model with this instruction:

~/build/tensorflow/tf_1.12.3-cpu/bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph='model.cas.f01-v2_aug_frcnn-1920-1080-dia.pb' --out_graph='opt-for-inf/opt_2q_model.cas.f01-v2_aug_frcnn-1920-1080-dia.pb' --inputs="image_tensor" -- outputs="detection_boxes,detection_scores,detection_classes,num_detections" --transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,1080,1920,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms merge_duplicate_nodes quantize_weights sort_by_execution_order'

And it did not make any better the inference times (let's say, going in the Xavier from 1.2 secs per inference to 0.8 or so). Adding 'quantize_nodes' gave me a mismatch on the layers of the model, which made it infeasible to use. Maybe it works differently for this topology, requiring me to explore more to see how to optimize this model for inference. It seems to work for SSDs, though; will test my own, and publish results.

The thing that I know, is that if for training you have access to a Volta architecture GPU at least (Titan-V, or Tesla V100), or RTX cards, you can use an env var and train on mixed datatypes the model (FP16 when possible, with some in FP32). That makes a better model for inference, if you really don't need the precision. That would depend on the use case: for medical images, highest precision possible. Object detection of vehicles or so, I guess you can compromise precision for speed. Mixed precision training w/nVidia-CUDA: https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#tensorflow-amp
My other approach, would be trying to convert the model to TF-Lite, and see how to use inference there. It's still on my backlog.

I compiled tensorflow with bazel v0.19.x.

161

answered Oct 17 '22 18:10

zRISC

Related questions
                            
                                resolving package resolutions in conda
                            
                                Python fold/reduce composition of multiple dictionaries
                            
                                Pandas / matplotlib is showing 2018 and 2019 years as 48 and 49
                            
                                Install ODBC Driver heroku
                            
                                How can I set boundary of Content-type using python requests?
                            
                                Update Sharepoint 2013 using Python3
                            
                                Python/Pandas - Query a MultiIndex Column [duplicate]
                            
                                LinearNDInterpolator -- Qhull precision error: Initial simplex is flat
                            
                                newrelic agent is not sending data to newrelic servers at staging only
                            
                                Python - Get list of all attributes/properties of a win32com class
                            
                                Select Multilines using Lasso Tool
                            
                                Passing arguments to cell magic %%script
                            
                                Scrapy process less than succesfully crawled
                            
                                Whatsapp Automated Bot not able to search in WhatsApp Contact List
                            
                                Correctly setting up Flask-SQLAlchemy for multiple celery workers and threads
                            
                                Passing OpenCv Mat from C++ to Python
                            
                                nested json to pandas very slow
                            
                                deeplab Restoring from checkpoint failed when training on own dataset
                            
                                How to find which TensorFlow is installed in my windows system? Whether it is CPU or GPU TensorFlow
                            
                                In Tensorflow, when use dataset.shuffle(1000), am I only using 1000 data from my whole dataset?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use `transform_graph` in Tensorflow

Tags:

python

macos

tensorflow

Chaine

People also ask

1 Answers

zRISC

Recent Activity

Donate For Us