Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between .pb and .h5

What is the main difference between .pb format of tensorflow and .h5 format of keras to store models? Is there any reason to choose one over the other?

like image 696
Debangshu Paul Avatar asked May 29 '20 05:05

Debangshu Paul


1 Answers

Different file formats with different characteristics, both used by tensorflow to save models (.h5 specifically by keras).

.pb - protobuf

It is a way to store some structured data (in this case a neural network),project is open source and currently overviewed by Google.

Example

person {
  name: "John Doe"
  email: "[email protected]"
}

Simple class containing two fields, you can load it in one of multiple supported languages (e.g. C++, Go), parse, modify and send to someone else in binary format.

Advantages

  • really small and efficient to parse (when compared to say .xml), hence often used for data transfer across the web
  • used by Tensorflow's Serving when you want to take your model to production (e.g. inference over the web)
  • language agnostic - binary format can be read by multiple languages (Java, Python, Objective-C, and C++ among others)
  • advised to use since tf2.0 , you can see official serializing guide
  • saves various metadata (optimizers, losses etc. if using keras's model)

Disadvantages

  • SavedModel is conceptually harder to grasp than single file
  • creates folder where weights are

Sources

You can read about this format here

.h5 - HDF5 binary data format

Used originally by keras to save models (keras is now officially part of tensorflow). It is less general and more "data-oriented", less programmatic than .pb.

Advantages

  • Used to save giant data (so some neural networks would fit well)
  • Common file saving format
  • Everything saved in one file (weights, losses, optimizers used with keras etc.)

Disadvantages

  • Cannot be used with Tensorflow Serving but you can simply convert it to .pb via keras.experimental.export_saved_model(model, 'path_to_saved_model')

All in all

Use the simpler one (.h5) if you don't need to productionize your model (or it's reasonably far away). Use .pb if you are going for production or just want to standardize on single format across all tensorflow provided tools.

like image 65
Szymon Maszke Avatar answered Nov 13 '22 01:11

Szymon Maszke