Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quantize a Keras neural network model

Recently, I've started creating neural networks with Tensorflow + Keras and I would like to try the quantization feature available in Tensorflow. So far, experimenting with examples from TF tutorials worked just fine and I have this basic working example (from https://www.tensorflow.org/tutorials/keras/basic_classification):

import tensorflow as tf
from tensorflow import keras

fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# fashion mnist data labels (indexes related to their respective labelling in the data set)
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# preprocess the train and test images
train_images = train_images / 255.0
test_images = test_images / 255.0

# settings variables
input_shape = (train_images.shape[1], train_images.shape[2])

# create the model layers
model = keras.Sequential([
keras.layers.Flatten(input_shape=input_shape),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])

# compile the model with added settings
model.compile(optimizer=tf.train.AdamOptimizer(),
          loss='sparse_categorical_crossentropy',
          metrics=['accuracy'])

# train the model
epochs = 3
model.fit(train_images, train_labels, epochs=epochs)

# evaluate the accuracy of model on test data
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)

Now, I would like to employ quantization in the learning and classification process. The quantization documentation (https://www.tensorflow.org/performance/quantization) (the page is no longer available since cca September 15, 2018) suggests to use this piece of code:

loss = tf.losses.get_total_loss()
tf.contrib.quantize.create_training_graph(quant_delay=2000000)
optimizer = tf.train.GradientDescentOptimizer(0.00001)
optimizer.minimize(loss)

However, it does not contain any information about where this code should be utilized or how it should be connected to a TF code (not even mentioning a high level model created with Keras). I have no idea how this quantization part relates to the previously created neural network model. Just inserting it following the neural network code runs into the following error:

Traceback (most recent call last):
  File "so.py", line 41, in <module>
    loss = tf.losses.get_total_loss()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/util.py", line 112, in get_total_loss
    return math_ops.add_n(losses, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 2119, in add_n
    raise ValueError("inputs must be a list of at least one Tensor with the "
ValueError: inputs must be a list of at least one Tensor with the same dtype and shape

Is it possible to quantize a Keras NN model in this way or am I missing something basic? A possible solution that crossed my mind could be using low level TF API instead of Keras (needing to do quite a bit of work to construct the model), or maybe trying to extract some of the lower level methods from the Keras models.

like image 535
sikr_ Avatar asked Sep 10 '18 13:09

sikr_


People also ask

What is quantization in neural network?

What is Quantization for Neural Networks? Quantization is the process of reducing the precision of the weights, biases, and activations such that they consume less memory.

What does it mean to quantize a model?

Introduction to Quantization A quantized model executes some or all of the operations on tensors with reduced precision rather than full precision (floating point) values. This allows for a more compact model representation and the use of high performance vectorized operations on many hardware platforms.


2 Answers

As mentioned in other answers, TensorFlow Lite can help you with network quantization.

TensorFlow Lite provides several levels of support for quantization.

Tensorflow Lite post-training quantization quantizes weights and activations post training easily. Quantization-aware training allows for training of networks that can be quantized with minimal accuracy drop; this is only available for a subset of convolutional neural network architectures.

So first, you need to decide whether you need post-training quantization or quantization-aware training. For example, if you already saved the model as *.h5 files, you would probably want to follow @Mitiku's instruction and do the post-training quantization.

If you prefer to achieve higher performance by simulating the effect of quantization in training (using the method you quoted in the question), and your model is in the subset of CNN architecture supported by quantization-aware training, this example may help you in terms of interaction between Keras and TensorFlow. Basically, you only need to add this code between model definition and its fitting:

sess = tf.keras.backend.get_session()
tf.contrib.quantize.create_training_graph(sess.graph)
sess.run(tf.global_variables_initializer())
like image 68
Jianyu Avatar answered Oct 04 '22 23:10

Jianyu


As your network looks quite simple, you can maybe use Tensorflow lite.

like image 25
Baptiste Pouthier Avatar answered Oct 05 '22 01:10

Baptiste Pouthier