Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow Serving: When to use it rather than simple inference inside Flask service?

I am serving a model trained using object detection API. Here is how I did it:

  • Create a Tensorflow service on port 9000 as described in the basic tutorial

  • Create a python code calling this service using predict_pb2 from tensorflow_serving.apis similar to this

  • Call this code inside a Flask server to make the service available with HTTP

Still, I could have done things much easier the following way :

  • Create a python code for inference like in the example in object detection repo
  • Call this code inside a Flask server to make the service available with HTTP

As you can see, I could have skipped the use of Tensorflow serving.

So, is there any good reason to use Tensorflow serving in my case ? If not, what are the cases where I should use it ?

like image 657
Aloïs de La Comble Avatar asked Jan 30 '18 17:01

Aloïs de La Comble


People also ask

What is the use of TensorFlow Serving?

“TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments while keeping the same server architecture and APIs.

What is serving default in TensorFlow?

TensorFlow Serving allows us to select which version of a model, or "servable" we want to use when we make inference requests. Each version will be exported to a different sub-directory under the given path.

How do you deploy a model using TensorFlow Serving?

Fortunately, TensorFlow was developed for production and it provides a solution for model deployment — TensorFlow Serving. Basically, there are three steps — export your model for serving, create a Docker container with your model and deploy it with Kubernetes into a cloud platform, i.e. Google Cloud or Amazon AWS.


2 Answers

I believe most of the reasons why you would prefer Tensorflow Serving over Flask are related to performance:

  • Tensorflow Serving makes use of gRPC and Protobuf while a regular Flask web service uses REST and JSON. JSON relies on HTTP 1.1 while gRPC uses HTTP/2 (there are important differences). In addition, Protobuf is a binary format used to serialize data and it is more efficient than JSON.
  • TensorFlow Serving can batch requests to the same model, which uses hardware (e.g. GPUs) more appropriate.
  • TensorFlow Serving can manage model versioning

As almost everything, it depends a lot on the use case you have and your scenario, so it's important to think about pros and cons and your requirements. TensorFlow Serving has great features, but these features could be also implemented to work with Flask with some effort (for instance, you could create your batch mechanism).

like image 147
Thomas Paula Avatar answered Oct 17 '22 18:10

Thomas Paula


Flask is used to handle request/response whereas Tensorflow serving is particularly built for serving flexible ML models in production.

Let's take some scenarios where you want to:

  • Serve multiple models to multiple products (Many to Many relations) at the same time.
  • Look which model is making an impact on your product (A/B Testing).
  • Update model weights in production, which is as easy as saving a new model to a folder.
  • Have a performance equal to code written in C/C++.

And you can always use all those advantages for FREE by sending requests to TF Serving using Flask.

like image 44
prashanth basani Avatar answered Oct 17 '22 17:10

prashanth basani