How to save TextVectorization to disk in tensorflow?

Question

I have trained a TextVectorization layer (see below), and I want to save it to disk, so that I can reload it next time? I have tried pickle and joblib.dump(). It does not work.

from tensorflow.keras.layers.experimental.preprocessing import TextVectorization 

text_dataset = tf.data.Dataset.from_tensor_slices(text_clean) 
    
vectorizer = TextVectorization(max_tokens=100000, output_mode='tf-idf',ngrams=None)
    
vectorizer.adapt(text_dataset.batch(1024))

The generated error is the following:

InvalidArgumentError: Cannot convert a Tensor of dtype resource to a NumPy array

How can I save it?

mujjiga · Accepted Answer

Instead of pickling the object, pickle the configuration and weights. Later unpickle it and use configuration to create the object and load the saved weights. Official docs here.

Code

text_dataset = tf.data.Dataset.from_tensor_slices([
                                                   "this is some clean text", 
                                                   "some more text", 
                                                   "even some more text"]) 
# Fit a TextVectorization layer
vectorizer = TextVectorization(max_tokens=10, output_mode='tf-idf',ngrams=None)    
vectorizer.adapt(text_dataset.batch(1024))

# Vector for word "this"
print (vectorizer("this"))

# Pickle the config and weights
pickle.dump({'config': vectorizer.get_config(),
             'weights': vectorizer.get_weights()}
            , open("tv_layer.pkl", "wb"))

print ("*"*10)
# Later you can unpickle and use 
# `config` to create object and 
# `weights` to load the trained weights. 

from_disk = pickle.load(open("tv_layer.pkl", "rb"))
new_v = TextVectorization.from_config(from_disk['config'])
# You have to call `adapt` with some dummy data (BUG in Keras)
new_v.adapt(tf.data.Dataset.from_tensor_slices(["xyz"]))
new_v.set_weights(from_disk['weights'])

# Lets see the Vector for word "this"
print (new_v("this"))

Output:

tf.Tensor(
[[0.         0.         0.         0.         0.91629076 0.
  0.         0.         0.         0.        ]], shape=(1, 10), dtype=float32)
**********
tf.Tensor(
[[0.         0.         0.         0.         0.91629076 0.
  0.         0.         0.         0.        ]], shape=(1, 10), dtype=float32)

jakub · Answer

One can use a bit of a hack to do this. Construct your TextVectorization object, then put it in a model. Save the model to save the vectorizer. Loading the model will reproduce the vectorizer. See the example below.

import tensorflow as tf
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

data = [
    "The sky is blue.",
    "Grass is green.",
    "Hunter2 is my password.",
]

# Create vectorizer.
text_dataset = tf.data.Dataset.from_tensor_slices(data)
vectorizer = TextVectorization(
    max_tokens=100000, output_mode='tf-idf', ngrams=None,
)
vectorizer.adapt(text_dataset.batch(1024))

# Create model.
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(1,), dtype=tf.string))
model.add(vectorizer)

# Save.
filepath = "tmp-model"
model.save(filepath, save_format="tf")

# Load.
loaded_model = tf.keras.models.load_model(filepath)
loaded_vectorizer = loaded_model.layers[0]

Here is a test that both vectorizers (original and loaded) produce the same output.

import numpy as np

np.testing.assert_allclose(loaded_vectorizer("blue"), vectorizer("blue"))

How to save TextVectorization to disk in tensorflow?

Tags:

tensorflow

pickle

keras

tensorflow2.0

yanachen

2 Answers

Code

mujjiga

jakub

Recent Activity

Donate For Us

How to save TextVectorization to disk in tensorflow?

Tags:

tensorflow

pickle

keras

tensorflow2.0

yanachen

2 Answers

Code

mujjiga

jakub

Related questions

Recent Activity

Donate For Us