Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why TensorBoard summary is not updating?

I use tensorboard with pytorch1.1 to log loss values.

I use writer.add_scalar("loss", loss.item(), global_step) in every for- loop body.

However, the plotting graph does not update while the training is processing.

Every time I want to see the latest loss, I have to restart the tensorboard server.

The code is here

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torch.utils.tensorboard import SummaryWriter
from torchvision import datasets, transforms

# Writer will output to ./runs/ directory by default
writer = SummaryWriter()

transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]
)
trainset = datasets.MNIST("mnist_train", train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
model = torchvision.models.resnet50(False)
# Have ResNet model take in grayscale rather than RGB
model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
model.fc = nn.Linear(2048, 10, True)

criterion = nn.CrossEntropyLoss()

epochs = 100

opt = torch.optim.Adam(model.parameters())

niter = 0

for epoch in range(epochs):
    for step, (x, y) in enumerate(trainloader):
        yp = model(x)
        loss = criterion(yp, y)
        opt.zero_grad()
        loss.backward()
        opt.step()
        writer.add_scalar("loss", loss.item(), niter)
        niter += 1
        print(loss.item())

grid = torchvision.utils.make_grid(images)
writer.add_image("images", grid, 0)
writer.add_graph(model, images)
writer.close()

The training is still going on, and the global steps has already been 3594. However, the tensorboard still shows around 1900.

enter image description here

like image 614
lucky yang Avatar asked May 04 '19 08:05

lucky yang


People also ask

How do you refresh a TensorBoard?

The reload interval can be configured using the --reload_interval flag of the TensorBoard process, but this option is currently only available in master and as of version 0.8 has not been released.

What is summary writer in TensorBoard?

The SummaryWriter class provides a high-level API to create an event file in a given directory and add summaries and events to it. The class updates the file contents asynchronously. This allows a training program to call methods to add data to the file directly from the training loop, without slowing down training.

Does TensorBoard work with PyTorch?

Note: Having TensorFlow installed is not a prerequisite to running TensorBoard, although it is a product of the TensorFlow ecosystem, TensorBoard by itself can be used with PyTorch.


2 Answers

Also for those who have multiple event log files for a single run, you need to start your tensorboard with --reload_multifile True

like image 55
Ismael EL ATIFI Avatar answered Nov 15 '22 07:11

Ismael EL ATIFI


There is caching done internally on the logging side. To see if that is the issue, create your SummaryWriter with

writer = SummaryWriter(flush_secs=1)

and see if things update right away. If so, feel free to tune flush_secs (defaults to 120) for your case. From your description, though, this might be from the TensorBoard visualization side. If so, it must have something to do with the polling interval.

Does installing TensorFlow (which forces TensorBoard to use a different filesystem backend) change this behavior for you?

like image 39
orionr Avatar answered Nov 15 '22 08:11

orionr