I've only seen a few questions that ask this, and none of them have an answer yet, so I thought I might as well try. I've been using gensim's word2vec model to create some vectors. I exported them into text, and tried importing it on tensorflow's live model of the embedding projector. One problem. It didn't work. It told me that the tensors were improperly formatted. So, being a beginner, I thought I would ask some people with more experience about possible solutions. Equivalent to my code: <pre class="prettyprint"><code>import gensim corpus = [["words","in","sentence","one"],["words","in","sentence","two"]] model = gensim.models.Word2Vec(iter = 5,size = 64) model.build_vocab(corpus) # save memory vectors = model.wv del model vectors.save_word2vec_format("vect.txt",binary = False) </code></pre> That creates the model, saves the vectors, and then prints the results out nice and pretty in a tab delimited file with values for all of the dimensions. I understand how to do what I'm doing, I just can't figure out what's wrong with the way I put it in tensorflow, as the documentation regarding that is pretty scarce as far as I can tell. One idea that has been presented to me is implementing the appropriate tensorflow code, but I don’t know how to code that, just import files in the live demo. Edit: I have a new problem now. The object I have my vectors in is non-iterable because gensim apparently decided to make its own data structures that are non-compatible with what I'm trying to do. Ok. Done with that too! Thanks for your help!

Gensim actually has the official way to do this. Documentation about it

Visualize Gensim Word2vec Embeddings in Tensorboard Projector

Tags:

I've only seen a few questions that ask this, and none of them have an answer yet, so I thought I might as well try. I've been using gensim's word2vec model to create some vectors. I exported them into text, and tried importing it on tensorflow's live model of the embedding projector. One problem. It didn't work. It told me that the tensors were improperly formatted. So, being a beginner, I thought I would ask some people with more experience about possible solutions.
Equivalent to my code:

import gensim
corpus = [["words","in","sentence","one"],["words","in","sentence","two"]]
model = gensim.models.Word2Vec(iter = 5,size = 64)
model.build_vocab(corpus)
# save memory
vectors = model.wv
del model
vectors.save_word2vec_format("vect.txt",binary = False)

That creates the model, saves the vectors, and then prints the results out nice and pretty in a tab delimited file with values for all of the dimensions. I understand how to do what I'm doing, I just can't figure out what's wrong with the way I put it in tensorflow, as the documentation regarding that is pretty scarce as far as I can tell.
One idea that has been presented to me is implementing the appropriate tensorflow code, but I don’t know how to code that, just import files in the live demo.

Edit: I have a new problem now. The object I have my vectors in is non-iterable because gensim apparently decided to make its own data structures that are non-compatible with what I'm trying to do.
Ok. Done with that too! Thanks for your help!

455

asked May 23 '18 15:05

I. Blum

2 Answers

What you are describing is possible. What you have to keep in mind is that Tensorboard reads from saved tensorflow binaries which represent your variables on disk.

More information on saving and restoring tensorflow graph and variables here

The main task is therefore to get the embeddings as saved tf variables.

Assumptions:

in the following code embeddings is a python dict {word:np.array (np.shape==[embedding_size])}

python version is 3.5+

used libraries are numpy as np, tensorflow as tf

the directory to store the tf variables is model_dir/

Step 1: Stack the embeddings to get a single `np.array`

embeddings_vectors = np.stack(list(embeddings.values(), axis=0))
# shape [n_words, embedding_size]

Step 2: Save the `tf.Variable` on disk

# Create some variables.
emb = tf.Variable(embeddings_vectors, name='word_embeddings')

# Add an op to initialize the variable.
init_op = tf.global_variables_initializer()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables and save the
# variables to disk.
with tf.Session() as sess:
   sess.run(init_op)

# Save the variables to disk.
   save_path = saver.save(sess, "model_dir/model.ckpt")
   print("Model saved in path: %s" % save_path)

model_dir should contain files checkpoint, model.ckpt-1.data-00000-of-00001, model.ckpt-1.index, model.ckpt-1.meta

Step 3: Generate a `metadata.tsv`

To have a beautiful labeled cloud of embeddings, you can provide tensorboard with metadata as Tab-Separated Values (tsv) (cf. here).

words = '\n'.join(list(embeddings.keys()))

with open(os.path.join('model_dir', 'metadata.tsv'), 'w') as f:
   f.write(words)

# .tsv file written in model_dir/metadata.tsv

Step 4: Visualize

Run $ tensorboard --logdir model_dir -> Projector.

To load metadata, the magic happens here:

load_meta

As a reminder, some word2vec embedding projections are also available on http://projector.tensorflow.org/

109

answered Oct 19 '22 09:10

syltruong

Gensim actually has the official way to do this.

Documentation about it

answered Oct 19 '22 11:10

Marco Oliveira

Related questions
                            
                                Name of a Python function in a stack trace
                            
                                How to create an async generator in Python?
                            
                                How to apply pos_tag_sents() to pandas dataframe efficiently
                            
                                How to access Slack's Interactive Message request payload parameter?
                            
                                Difference between Linear Regression Coefficients between Python and R
                            
                                How to access "__" (double underscore) variables in methods added to a class
                            
                                How can I create a language independent library using Python?
                            
                                SQLAlchemy - Multiple Foreign key pointing to same table same attribute
                            
                                How to standardize data with sklearn's cross_val_score()
                            
                                What are the arguments for scipy.stats.uniform?
                            
                                pyodbc.connect() works, but not sqlalchemy.create_engine().connect()
                            
                                ALLOWED_HOSTS and Django
                            
                                Beautiful Soup Nested Tag Search
                            
                                Prevent setup.py test / pytest from installing extra dependencies
                            
                                Error installing psycopg2==2.6.2
                            
                                How to break on `pass` in pycharm
                            
                                How to convert a PDF from base64 string to a file?
                            
                                Anaconda - UnsatisfiableError: The following specifications were found to be in conflict
                            
                                How to change jupyter kernel from Python 2 to python 3?
                            
                                Airflow latency between tasks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Visualize Gensim Word2vec Embeddings in Tensorboard Projector

Tags:

python

tensorflow

tensorboard

word-embedding

gensim

I. Blum

People also ask

2 Answers

Step 1: Stack the embeddings to get a single `np.array`

Step 2: Save the `tf.Variable` on disk

Step 3: Generate a `metadata.tsv`

Step 4: Visualize

syltruong

Marco Oliveira

Recent Activity

Donate For Us

Visualize Gensim Word2vec Embeddings in Tensorboard Projector

Tags:

python

tensorflow

tensorboard

word-embedding

gensim

I. Blum

People also ask

2 Answers

Step 1: Stack the embeddings to get a single np.array

Step 2: Save the tf.Variable on disk

Step 3: Generate a metadata.tsv

Step 4: Visualize

syltruong

Marco Oliveira

Related questions

Recent Activity

Donate For Us

Step 1: Stack the embeddings to get a single `np.array`

Step 2: Save the `tf.Variable` on disk

Step 3: Generate a `metadata.tsv`