I have a Word2Vec
model which is trained in Gensim
. How can I use it in Tensorflow
for Word Embeddings
. I don't want to train Embeddings from scratch in Tensorflow. Can someone tell me how to do it with some example code?
Google's Word2vec Pretrained Word EmbeddingWord2Vec is one of the most popular pretrained word embeddings developed by Google. Word2Vec is trained on the Google News dataset (about 100 billion words).
Let's assume you have a dictionary and inverse_dict list, with index in list corresponding to most common words:
vocab = {'hello': 0, 'world': 2, 'neural':1, 'networks':3}
inv_dict = ['hello', 'neural', 'world', 'networks']
Notice how the inverse_dict index corresponds to the dictionary values. Now declare your embedding matrix and get the values:
vocab_size = len(inv_dict)
emb_size = 300 # or whatever the size of your embeddings
embeddings = np.zeroes((vocab_size, emb_size))
from gensim.models.keyedvectors import KeyedVectors
model = KeyedVectors.load_word2vec_format('embeddings_file', binary=True)
for k, v in vocab.items():
embeddings[v] = model[k]
You've got your embeddings matrix. Good. Now let's assume you want to train on the sample: x = ['hello', 'world']
. But this doesn't work for our neural net. We need to integerize:
x_train = []
for word in x:
x_train.append(vocab[word]) # integerize
x_train = np.array(x_train) # make into numpy array
Now we are good to go with embedding our samples on-the-fly
x_model = tf.placeholder(tf.int32, shape=[None, input_size])
with tf.device("/cpu:0"):
embedded_x = tf.nn.embedding_lookup(embeddings, x_model)
Now embedded_x
goes into your convolution or whatever. I am also assuming you are not retraining the embeddings, but simply using them. Hope that helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With