I've trained a sentiment classifier model using Keras library by following the below steps(broadly). <ol> <li>Convert Text corpus into sequences using Tokenizer object/class</li> <li>Build a model using the model.fit() method </li> <li>Evaluate this model</li> </ol> Now for scoring using this model, I was able to save the model to a file and load from a file. However I've not found a way to save the Tokenizer object to file. Without this I'll have to process the corpus every time I need to score even a single sentence. Is there a way around this?

The most common way is to use either <code>pickle</code> or <code>joblib</code>. Here you have an example on how to use <code>pickle</code> in order to save <code>Tokenizer</code>: <pre class="prettyprint lang-py prettyprint-override"><code>import pickle # saving with open('tokenizer.pickle', 'wb') as handle: pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL) # loading with open('tokenizer.pickle', 'rb') as handle: tokenizer = pickle.load(handle) </code></pre>

Tokenizer class has a function to save date into JSON format: <pre class="prettyprint lang-py prettyprint-override"><code>tokenizer_json = tokenizer.to_json() with io.open('tokenizer.json', 'w', encoding='utf-8') as f: f.write(json.dumps(tokenizer_json, ensure_ascii=False)) </code></pre> The data can be loaded using <code>tokenizer_from_json</code> function from <code>keras_preprocessing.text</code>: <pre class="prettyprint lang-py prettyprint-override"><code>with open('tokenizer.json') as f: data = json.load(f) tokenizer = tokenizer_from_json(data) </code></pre>

Keras Text Preprocessing - Saving Tokenizer object to file for scoring

Tags:

machine-learning

neural-network

deep-learning

nlp

keras

I've trained a sentiment classifier model using Keras library by following the below steps(broadly).

Convert Text corpus into sequences using Tokenizer object/class
Build a model using the model.fit() method
Evaluate this model

Now for scoring using this model, I was able to save the model to a file and load from a file. However I've not found a way to save the Tokenizer object to file. Without this I'll have to process the corpus every time I need to score even a single sentence. Is there a way around this?

758

asked Aug 17 '17 12:08

Rajkumar Kaliyaperumal

2 Answers

The most common way is to use either pickle or joblib. Here you have an example on how to use pickle in order to save Tokenizer:

import pickle  # saving with open('tokenizer.pickle', 'wb') as handle:     pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)  # loading with open('tokenizer.pickle', 'rb') as handle:     tokenizer = pickle.load(handle)

answered Sep 18 '22 14:09

Marcin Możejko

Tokenizer class has a function to save date into JSON format:

tokenizer_json = tokenizer.to_json() with io.open('tokenizer.json', 'w', encoding='utf-8') as f:     f.write(json.dumps(tokenizer_json, ensure_ascii=False))

The data can be loaded using tokenizer_from_json function from keras_preprocessing.text:

with open('tokenizer.json') as f:     data = json.load(f)     tokenizer = tokenizer_from_json(data)

answered Sep 18 '22 14:09

Max

Related questions
                            
                                Cost Function, Linear Regression, trying to avoid hard coding theta. Octave.
                            
                                How does one debug NaN values in TensorFlow?
                            
                                How do I visualize a net in Pytorch?
                            
                                Feature/Variable importance after a PCA analysis
                            
                                Show progress bar for each epoch during batchwise training in Keras
                            
                                Keras accuracy does not change
                            
                                How to log Keras loss output to a file
                            
                                Save MinMaxScaler model in sklearn
                            
                                Can someone explain to me the difference between a cost function and the gradient descent equation in logistic regression?
                            
                                Converting a Vision VNTextObservation to a String
                            
                                Training data for sentiment analysis [closed]
                            
                                What does the "fit" method in scikit-learn do? [closed]
                            
                                How can I use a pre-trained neural network with grayscale images?
                            
                                What is a Learning Curve in machine learning?
                            
                                Calculate the Output size in Convolution layer [closed]
                            
                                Logo recognition in images [closed]
                            
                                How are neural networks used when the number of inputs could be variable?
                            
                                Dummy variables when not all categories are present
                            
                                TfidfVectorizer in scikit-learn : ValueError: np.nan is an invalid document
                            
                                How to find the corresponding class in clf.predict_proba()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With