Running session using tensorflow c++ api is significantly slower than using python

Question

I am trying to run SqueezeDet using tensorflow c++ api (CPU only). I have freezed tensorflow graph and loaded it from C++. While in terms of detection quality everything is fine, performance is much slower than in python. What can be the reason of that?

Simplified, my code looks like this:

  int main (int argc, const char * argv[])
  {
    // Initializing graph 
    tensorflow::GraphDef graph_def;
    // Folder in which graph data is located
    string graph_file_name = "Model/graph.pb";
    // Loading graph 
    tensorflow::Status graph_loaded_status =  ReadBinaryProto(tensorflow::Env::Default(), graph_file_name, &graph_def);
    if (!graph_loaded_status.ok())
    {
      cout << graph_loaded_status.ToString() << endl;
      return 1;
    }
    unique_ptr<tensorflow::Session> session_sqdet(tensorflow::NewSession(tensorflow::SessionOptions()));
    tensorflow::Status session_create_status = session_sqdet->Create(graph_def);
    if (!session_create_status.ok())
    {
      cout << "Session create status: fail." << endl;
      return 1;
    }
    while ()
    {
      /* create & preprocess batch */

      session.Run({{ "image_input", input_tensor}, {"keep_prob", prob_tensor}}, {"probability/score", "bbox/trimming/bbox"}, {}, &final_output);

      /* do some postprocessing */
    }
  }

What I have tried:

1) Using optimization flags - all are on, no warnings.

2) Using batching: performance increased, but the gap between python and C++ is still significant (running session takes 1s vs 2.4s with batch_size = 20).

Any help would be highly appreciated.

Guildenstern · Accepted Answer

I've spent a lot of time on that problem (most of it because of stupid mistakes I made), but I finally solved it. Now I want to post here my experience as it might be useful.

So those are steps I'd advice to follow someone facing the same issue (some of them are quite obvious, though):

0) Do the profiling properly! Be sure you are using tools reliable in multicore/GPU/whatever setting you have.

1) Check that tensorflow and all related packages are built with all optimizations on.

2) Optimize the graph after freezing.

3) In case you are using different batch sizes during training and inference, make sure that you have removed all the dependencies in the model! Note that otherwise you won't have an error message or even worse performance in terms of results quality, you'll only have a mysterious slowdown!

Running session using tensorflow c++ api is significantly slower than using python

Tags:

tensorflow

Guildenstern

1 Answers

Guildenstern

Recent Activity

Donate For Us

Running session using tensorflow c++ api is significantly slower than using python

Tags:

tensorflow

Guildenstern

1 Answers

Guildenstern

Related questions

Recent Activity

Donate For Us