I have trained a TextVectorization layer (see below), and I want to save it to disk, so that I can reload it next time? I have tried pickle and joblib.dump(). It does not work.
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
text_dataset = tf.data.Dataset.from_tensor_slices(text_clean)
vectorizer = TextVectorization(max_tokens=100000, output_mode='tf-idf',ngrams=None)
vectorizer.adapt(text_dataset.batch(1024))
The generated error is the following:
InvalidArgumentError: Cannot convert a Tensor of dtype resource to a NumPy array
How can I save it?
Instead of pickling the object, pickle the configuration and weights. Later unpickle it and use configuration to create the object and load the saved weights. Official docs here.
text_dataset = tf.data.Dataset.from_tensor_slices([
"this is some clean text",
"some more text",
"even some more text"])
# Fit a TextVectorization layer
vectorizer = TextVectorization(max_tokens=10, output_mode='tf-idf',ngrams=None)
vectorizer.adapt(text_dataset.batch(1024))
# Vector for word "this"
print (vectorizer("this"))
# Pickle the config and weights
pickle.dump({'config': vectorizer.get_config(),
'weights': vectorizer.get_weights()}
, open("tv_layer.pkl", "wb"))
print ("*"*10)
# Later you can unpickle and use
# `config` to create object and
# `weights` to load the trained weights.
from_disk = pickle.load(open("tv_layer.pkl", "rb"))
new_v = TextVectorization.from_config(from_disk['config'])
# You have to call `adapt` with some dummy data (BUG in Keras)
new_v.adapt(tf.data.Dataset.from_tensor_slices(["xyz"]))
new_v.set_weights(from_disk['weights'])
# Lets see the Vector for word "this"
print (new_v("this"))
Output:
tf.Tensor(
[[0. 0. 0. 0. 0.91629076 0.
0. 0. 0. 0. ]], shape=(1, 10), dtype=float32)
**********
tf.Tensor(
[[0. 0. 0. 0. 0.91629076 0.
0. 0. 0. 0. ]], shape=(1, 10), dtype=float32)
One can use a bit of a hack to do this. Construct your TextVectorization object, then put it in a model. Save the model to save the vectorizer. Loading the model will reproduce the vectorizer. See the example below.
import tensorflow as tf
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
data = [
"The sky is blue.",
"Grass is green.",
"Hunter2 is my password.",
]
# Create vectorizer.
text_dataset = tf.data.Dataset.from_tensor_slices(data)
vectorizer = TextVectorization(
max_tokens=100000, output_mode='tf-idf', ngrams=None,
)
vectorizer.adapt(text_dataset.batch(1024))
# Create model.
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(1,), dtype=tf.string))
model.add(vectorizer)
# Save.
filepath = "tmp-model"
model.save(filepath, save_format="tf")
# Load.
loaded_model = tf.keras.models.load_model(filepath)
loaded_vectorizer = loaded_model.layers[0]
Here is a test that both vectorizers (original and loaded) produce the same output.
import numpy as np
np.testing.assert_allclose(loaded_vectorizer("blue"), vectorizer("blue"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With