I have a TensorFlow
model with a single Dense
layer:
model = tf.keras.Sequential([tf.keras.layers.Dense(2)])
model.build(input_shape=(None, None, 25))
I construct a single input vector in float32
:
np_vec = np.array(np.random.randn(1, 1, 25), dtype=np.float32)
vec = tf.cast(tf.convert_to_tensor(np_vec), dtype=tf.float32)
I want to feed that to my model for prediction, but it is very slow.
If I call predict
or __call__
it takes a really long time, compared to doing the same operation in NumPy.
%timeit model.predict(vec)
:
10 loops, best of 3: 21.9 ms per loop
%timeit model(vec, training=False)
:
1000 loops, best of 3: 806 µs per loop
weights = np.array(model.layers[0].get_weights()[0])
%timeit np_vec @ weights
1000000 loops, best of 3: 1.27 µs per loop
100000 loops, best of 3: 2.57 µs per loop
Google Colab: https://colab.research.google.com/drive/1RCnTM24RUI4VkykVtdRtRdUVEkAHdu4A?usp=sharing
How can I make my TensorFlow model faster in inference time?
Especially because I don't only have a Dense
layer, but I also use an LSTM
and I don't want to reimplement that in NumPy.
The whole story lies behind the implementation of the LSTM layer in Keras. The Keras LSTM layer has a default argument unroll=False
. This causes the LSTM to run a symbolic loop (loop causes more time). Try adding an extra argument to the LSTM as unroll=True
.
tf.keras.layers.LSTM(64, return_sequences=True, stateful=True, unroll=True)
This may result in up to a 2x speed boost up (tested on my machine, using %timeit model(vec, training=False)
). However, using unroll=True
may cause taking more ram for larger sequences. For more inquiry, please have a look at the Keras LSTM documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With