I am unable to match the inference times reported by Google for models released in their model zoo. Specifically I am trying out their faster_rcnn_resnet101_coco
model where the reported inference time is 106ms
on a Titan X GPU.
My serving system is using TF 1.4 running in a container built from the Dockerfile released by Google. My client is modeled after the inception client also released by Google.
I am running on an Ubuntu 14.04, TF 1.4 with 1 Titan X. My total inference time is 3x worse than reported by Google ~330ms. Making the tensor proto is taking ~150ms and Predict is taking ~180ms. My saved_model.pb
is directly from the tar file downloaded from the model zoo. Is there something I am missing? What steps can I take to reduce the inference time?
I was able to solve the two problems by
optimizing the compiler flags. Added the following to bazel-bin --config=opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma
Not importing tf.contrib for every inference. In the inception_client sample provided by google, these lines re-import tf.contrib for every forward pass.
Non-max suppression may be the bottleneck: https://github.com/tensorflow/models/issues/2710.
Is the image size 600x600?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With