Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use the Embedding Projector in Tensorflow 2.0

With the tf.contrib module gone from Tensorflow, and with tf.train.Saver() also gone, I cannot find a way to store a set of embeddings and their corresponding thumbnails, so that the Tensorboard Projector can read them.

The Tensorboard documentation for Tensorflow 2.0 explains how to create plots and summaries, and how to use the summary tool in general, but nothing about the projector tool. Has anyone found how to store datasets for visualization?

If possible, I would appreciate a (minimal) code example.

like image 674
gaspercat Avatar asked Jul 12 '19 21:07

gaspercat


People also ask

What is TensorFlow embedding projection?

The TensorFlow embedding projector consists of three panels: Data panel – W hich is used to run and color the data points. Projections panel – W hich is used to select the type of projection. Inspector panel – W hich is used to search for specific points and look at nearest neighbors.

How do I use the embedding projector in tensorboard?

Open the Embedding Projector (this can also run in a local TensorBoard instance). Click on "Load data". Upload the two files you created above: vecs.tsv and meta.tsv. The embeddings you have trained will now be displayed. You can search for words to find their closest neighbors. For example, try searching for "beautiful".

How do you embed a text file in TensorFlow?

Creating Embedding in TensorFlow To create word embedding in TensorFlow, you start off by splitting the input text into words and then assigning an integer to every word. After that has been done, the word_id become a vector of these integers. Let us take an example for embedding in TensorFlow, “I love the dog.”

How to use tensoflow with tensorboard?

Here first we create a TensoFlow variable ( images) and then save it using tf.train.Saver. After executing the code we can launch TensorBoard by issuing tensorboard --logdir=logs command and opening localhost:6006 in a browser. However this visualisation is not very helpful because we do not see different classes to which each data point belongs.


1 Answers

It seems there are some issues left in tensorboard. However, there are some workarounds (for now) for preparing embeddings for projector with tensorflow2: (bug report at: https://github.com/tensorflow/tensorboard/issues/2471)

tensorflow1 code would look something like that:

embeddings = tf.compat.v1.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
# Write summaries for tensorboard
with tf.compat.v1.Session() as sess:
    saver = tf.compat.v1.train.Saver([embeddings])
    sess.run(embeddings.initializer)
    saver.save(sess, CHECKPOINT_FILE)
    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    embedding.tensor_name = embeddings.name
    embedding.metadata_path = TENSORBOARD_METADATA_FILE

projector.visualize_embeddings(tf.summary.FileWriter(TENSORBOARD_DIR), config)

when using eager mode in tensorflow2 this should (?) look somehow like this:

embeddings = tf.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
ckpt = tf.train.Checkpoint(embeddings=embeddings)
ckpt.save(CHECKPOINT_FILE)

config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = embeddings.name
embedding.metadata_path = TENSORBOARD_METADATA_FILE

writer = tf.summary.create_file_writer(TENSORBOARD_DIR)
projector.visualize_embeddings(writer, config)

however, there are 2 issues:

  • the writer created with tf.summary.create_file_writer does not have the function get_logdir() required by projector.visualize_embeddings, a simple workaround is to patch the visualize_embeddings function to take the logdir as parameter.
  • the checkpoint format has changed, when reading the checkpoint with load_checkpoint (which seems to be the tensorboard way of loading the file), the variable names change. e.g. embeddings changes to something like embeddings/.ATTRIBUTES/VARIABLE_VALUE (also there are additional variables in the map extracted by get_variable_to_shape_map()but they are empty anyways).

the second issue was solved with the following quick-and-dirty workaround (and logdir is now a parameter of visualize_embeddings())

embeddings = tf.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
ckpt = tf.train.Checkpoint(embeddings=embeddings)
ckpt.save(CHECKPOINT_FILE)

reader = tf.train.load_checkpoint(TENSORBOARD_DIR)
map = reader.get_variable_to_shape_map()
key_to_use = ""
for key in map:
    if "embeddings" in key:
        key_to_use = key

config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = key_to_use
embedding.metadata_path = TENSORBOARD_METADATA_FILE

writer = tf.summary.create_file_writer(TENSORBOARD_DIR)
projector.visualize_embeddings(writer, config,TENSORBOARD_DIR)

I did not find any examples on how to use tensorflow2 to directly write the embeddings for tensorboard, so I am not sure if this is the right way, but if it is, then those two issues would need to be addressed, and at least for now there is a workaround.

like image 97
pteufl Avatar answered Nov 03 '22 00:11

pteufl