Using Keras' tokenizer with premade indexed dictionary

Question

I am working on an NLP problem.

I have downloaded premade embedding weights to use for an embedding layer. Before the embedding layer I need to tokenize my dataset which is currently in the form of strings of sentences. I want to tokenize it using the same indices as my premade embedding layer.

Is there a way to initialize the Keras tokenizer (tensorflow.keras.preprocessing.text.Tokenizer) with a premade dictionary of the sort: { 'the': 1, 'me': 2, 'a': 3 ..... } so it won't decide on its own which index to give each word?

Hamiz Ahmed · Accepted Answer

You can initialize a tokenizer object and assign the word index manually to it. You can then use it to index your sentence.

token = text.Tokenizer()
token.word_index = {"the":1, "elephant":2}
token.texts_to_sequences(["the elephant"])

This will return [[1, 2]]

Using Keras' tokenizer with premade indexed dictionary

Tags:

tensorflow

keras

Fuseques

1 Answers

Hamiz Ahmed

Recent Activity

Donate For Us

Using Keras' tokenizer with premade indexed dictionary

Tags:

tensorflow

keras

Fuseques

1 Answers

Hamiz Ahmed

Related questions

Recent Activity

Donate For Us