Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How to use pretrained Word2Vec model in Tensorflow

I have a Word2Vec model which is trained in Gensim. How can I use it in Tensorflow for Word Embeddings. I don't want to train Embeddings from scratch in Tensorflow. Can someone tell me how to do it with some example code?

like image 692
neel Avatar asked Mar 28 '17 13:03


People also ask

Is Word2Vec a Pretrained model?

Google's Word2vec Pretrained Word EmbeddingWord2Vec is one of the most popular pretrained word embeddings developed by Google. Word2Vec is trained on the Google News dataset (about 100 billion words).

1 Answers

Let's assume you have a dictionary and inverse_dict list, with index in list corresponding to most common words:

vocab = {'hello': 0, 'world': 2, 'neural':1, 'networks':3}
inv_dict = ['hello', 'neural', 'world', 'networks']

Notice how the inverse_dict index corresponds to the dictionary values. Now declare your embedding matrix and get the values:

vocab_size = len(inv_dict)
emb_size = 300 # or whatever the size of your embeddings
embeddings = np.zeroes((vocab_size, emb_size))

from gensim.models.keyedvectors import KeyedVectors                         
model = KeyedVectors.load_word2vec_format('embeddings_file', binary=True)

for k, v in vocab.items():
  embeddings[v] = model[k]

You've got your embeddings matrix. Good. Now let's assume you want to train on the sample: x = ['hello', 'world']. But this doesn't work for our neural net. We need to integerize:

x_train = []
for word in x:  
  x_train.append(vocab[word]) # integerize
x_train = np.array(x_train) # make into numpy array

Now we are good to go with embedding our samples on-the-fly

x_model = tf.placeholder(tf.int32, shape=[None, input_size])
with tf.device("/cpu:0"):
  embedded_x = tf.nn.embedding_lookup(embeddings, x_model)

Now embedded_x goes into your convolution or whatever. I am also assuming you are not retraining the embeddings, but simply using them. Hope that helps

like image 121
vega Avatar answered Sep 29 '22 04:09
