Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use TensorBoard with Keras in Python for visualizing embeddings

I'm reading the book Deep Learning with Python which uses Keras. In chapter 7, it shows how to use TensorBoard to monitor the training phase progress with an example:

import keras
from keras import layers
from keras.datasets import imdb
from keras.preprocessing import sequence

max_features = 2000
max_len = 500
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)

model = keras.models.Sequential()
model.add(layers.Embedding(max_features, 128, input_length=max_len, name='embed'))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))
model.summary()

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

callbacks = [
    keras.callbacks.TensorBoard(
        log_dir='my_log_dir',
        histogram_freq=1,
        embeddings_freq=1,
    )
]
history = model.fit(x_train, y_train, epochs=20, batch_size=128, validation_split=0.2, callbacks=callbacks)

Apparently, the Keras library has gone through some changes since this code raises some exception:

ValueError: To visualize embeddings, embeddings_data must be provided.

This is after the first epoch is done and the first time the callbacks are run (the first time TensorBoard is run). I know that what is missing is the TensorBoard's parameter embeddings_data. But I don't know what should I assign to it.

Does anyone have a working example for this?

Here are the versions I'm using:

Python: 3.6.5
Keras: 2.2.0
Tensorflow: 1.9.0

[UPDATE]

In order to test any possible solution, I tested this:

import numpy as np

callbacks = [
    keras.callbacks.TensorBoard(
        log_dir='my_log_dir',
        histogram_freq = 1,
        embeddings_freq = 1,
        embeddings_data = np.arange(0, max_len).reshape((1, max_len)),
    )
]
history = model.fit(x_train, y_train, epochs=20, batch_size=128, validation_split=0.2, callbacks=callbacks)

This is the only way I could populate embeddings_data which won't lead to an error. But even though, this does not help either. Still the PROJECTOR tab of the TensorBoard is empty:

enter image description here

Any help is appreciated.

like image 677
Mehran Avatar asked Sep 02 '18 16:09

Mehran


People also ask

Can I use TensorBoard with keras?

Model compilation & fitting dataTensorboard, or tensorboard , in its own is the implementation as defined by the Keras API. In our case, we save logs at . \logs , generate weight histograms after each epochs, and do write weight images to our logs.

How do you visualize an embedded layer?

To visualize the word embedding, we are going to use common dimensionality reduction techniques such as PCA and t-SNE. To map the words into their vector representations in embedding space, the pre-trained word embedding GloVe will be implemented.

What is TensorBoard in keras?

TensorBoard is a visualization tool provided with TensorFlow. This callback logs events for TensorBoard, including: Metrics summary plots. Training graph visualization.


2 Answers

I'm also reading the book "Deep Learning with Python" which uses Keras. Here is my solution to this question. First,I try this code:

callbacks = [keras.callbacks.TensorBoard(
    log_dir = 'my_log_dir',
    histogram_freq = 1,
    embeddings_freq = 1,
    embeddings_data = x_train,
)]
history = model.fit(x_train, y_train, epochs=2, batch_size=128, validation_split=0.2, callbacks=callbacks)

But there is an error: ResourceExhaustedError.

Because there are 25000 samples in "x_train", it is hard to embedding all of them on my old notebook. So next I try to embedding the first 100 samples of "x_train", and it makes sense.

The code and result are showed here.

callbacks = [keras.callbacks.TensorBoard(
    log_dir = 'my_log_dir',
    histogram_freq = 1,
    embeddings_freq = 1,
    embeddings_data = x_train[:100],
)]
history = model.fit(x_train, y_train, epochs=2, batch_size=128, validation_split=0.2, callbacks=callbacks)

Projector of 100 samples

Note that in the projector, "Points: 100" means there are 100 samples, and "Dimension: 64000" means the embedding vector length for one sample is 64000. There are 500 words in one sample, as "max_len = 500", and there is a 128_dim vector for each word, so 500 * 128 = 64000.

like image 184
ttigong Avatar answered Sep 28 '22 05:09

ttigong


Yes that is correct, you need to provide what to embed for the visualisation using the embeddings_data argument:

callbacks = [
    keras.callbacks.TensorBoard(
        log_dir='my_log_dir',
        histogram_freq=1,
        embeddings_freq=1,
        embeddings_data=np.array([3,4,2,5,2,...]),
    )
]

embeddings_data: data to be embedded at layers specified in embeddings_layer_names. Numpy array (if the model has a single input) or list of Numpy arrays (if the model has multiple inputs).

Have a look at the documentation for updated information on what those arguments are.

like image 33
nuric Avatar answered Sep 28 '22 05:09

nuric