I have a <code>Word2Vec</code> model which is trained in <code>Gensim</code>. How can I use it in <code>Tensorflow</code> for <code>Word Embeddings</code>. I don't want to train Embeddings from scratch in Tensorflow. Can someone tell me how to do it with some example code?

Let's assume you have a dictionary and inverse_dict list, with index in list corresponding to most common words: <pre class="prettyprint"><code>vocab = {'hello': 0, 'world': 2, 'neural':1, 'networks':3} inv_dict = ['hello', 'neural', 'world', 'networks'] </code></pre> Notice how the inverse_dict index corresponds to the dictionary values. Now declare your embedding matrix and get the values: <pre class="prettyprint"><code>vocab_size = len(inv_dict) emb_size = 300 # or whatever the size of your embeddings embeddings = np.zeroes((vocab_size, emb_size)) from gensim.models.keyedvectors import KeyedVectors model = KeyedVectors.load_word2vec_format('embeddings_file', binary=True) for k, v in vocab.items(): embeddings[v] = model[k] </code></pre> You've got your embeddings matrix. Good. Now let's assume you want to train on the sample: <code>x = ['hello', 'world']</code>. But this doesn't work for our neural net. We need to integerize: <pre class="prettyprint"><code>x_train = [] for word in x: x_train.append(vocab[word]) # integerize x_train = np.array(x_train) # make into numpy array </code></pre> Now we are good to go with embedding our samples on-the-fly <pre class="prettyprint"><code>x_model = tf.placeholder(tf.int32, shape=[None, input_size]) with tf.device("/cpu:0"): embedded_x = tf.nn.embedding_lookup(embeddings, x_model) </code></pre> Now <code>embedded_x</code> goes into your convolution or whatever. I am also assuming you are not retraining the embeddings, but simply using them. Hope that helps

How to use pretrained Word2Vec model in Tensorflow

Tags:

python

tensorflow

word-embedding

gensim

word2vec

I have a Word2Vec model which is trained in Gensim. How can I use it in Tensorflow for Word Embeddings. I don't want to train Embeddings from scratch in Tensorflow. Can someone tell me how to do it with some example code?

692

asked Mar 28 '17 13:03

neel

1 Answers

Let's assume you have a dictionary and inverse_dict list, with index in list corresponding to most common words:

vocab = {'hello': 0, 'world': 2, 'neural':1, 'networks':3}
inv_dict = ['hello', 'neural', 'world', 'networks']

Notice how the inverse_dict index corresponds to the dictionary values. Now declare your embedding matrix and get the values:

vocab_size = len(inv_dict)
emb_size = 300 # or whatever the size of your embeddings
embeddings = np.zeroes((vocab_size, emb_size))

from gensim.models.keyedvectors import KeyedVectors                         
model = KeyedVectors.load_word2vec_format('embeddings_file', binary=True)

for k, v in vocab.items():
  embeddings[v] = model[k]

You've got your embeddings matrix. Good. Now let's assume you want to train on the sample: x = ['hello', 'world']. But this doesn't work for our neural net. We need to integerize:

x_train = []
for word in x:  
  x_train.append(vocab[word]) # integerize
x_train = np.array(x_train) # make into numpy array

Now we are good to go with embedding our samples on-the-fly

x_model = tf.placeholder(tf.int32, shape=[None, input_size])
with tf.device("/cpu:0"):
  embedded_x = tf.nn.embedding_lookup(embeddings, x_model)

Now embedded_x goes into your convolution or whatever. I am also assuming you are not retraining the embeddings, but simply using them. Hope that helps

121

answered Sep 29 '22 04:09

vega

Related questions
                            
                                What is the analogue of EXCEPT clause in SQL in Pandas?
                            
                                Random search without cross validation in python/sklearn
                            
                                Formatting date labels on bar plot
                            
                                Where to hold common strftime strings like ("%d/%m/%Y")
                            
                                How do I paste multi-line python script into ConEmu?
                            
                                ValueError: unsupported pickle protocol: 4 with pandas
                            
                                Mixing audio files in MoviePy
                            
                                Writing cross-compatible Python 2/3: Difference between __future__, six, and future.utils?
                            
                                Batch-major vs time-major LSTM
                            
                                Python property with public getter and private setter
                            
                                "sh: sysctl Command not Found " for Mac OS X running a cron job
                            
                                cx_Freeze - The appdirs package is required
                            
                                Session is shared between two Flask apps on localhost
                            
                                How restart Scrapy spider
                            
                                Why do I need to deploy a "default" app before I can deploy multiple services in GAE?
                            
                                What needs to be in a setup.py to create a wheel?
                            
                                Python Multiprocessing - Why are my processes are not returning/finishing?
                            
                                Download subset of file from s3 using Boto3
                            
                                How does one achieve parallel gzip compression with Python?
                            
                                How I can specify SQS queue name in celery

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With