Monitor training/validation process in Caffe

Question

I'm training Caffe Reference Model for classifying images. My work requires me to monitor the training process by drawing graph of accuracy of the model after every 1000 iterations on entire training set and validation set which has 100K and 50K images respectively. Right now, Im taking the naive approach, make snapshots after every 1000 iterations, run the C++ classififcation code which reads raw JPEG image and forward to the net and output the predicted labels. However, this takes too much time on my machine (with a Geforce GTX 560 Ti)

Is there any faster way that I can do to have the graph of accuracy of the snapshot models on both training and validation sets?

I was thinking about using LMDB format instead of raw images. However, I cannot find documentation/code about doing classification in C++ using LMDB format.

Is there any faster way that I can do to have the graph of accuracy of the snapshot models on both training and validation sets?

I was thinking about using LMDB format instead of raw images. However, I cannot find documentation/code about doing classification in C++ using LMDB format.

Jia Li · Accepted Answer

1) You can use the NVIDIA-DIGITS app to monitor your networks. They provide a GUI including dataset preparation, model selection, and learning curve visualization. More, they use a caffe distribution allowing multi-GPU training.

2) Or, you can simply use the log-parser inside caffe.

/pathtocaffe/build/tools/caffe train --solver=solver.prototxt 2>&1 | tee lenet_train.log

This allows you to save train log into "lenet_train.log". Then by using:

python /pathtocaffe/tools/extra/parse_log.py lenet_train.log .

you parse your train log into two csv files, containing train and test loss. You can then plot them using the following python script

import pandas as pd
from matplotlib import *
from matplotlib.pyplot import *

train_log = pd.read_csv("./lenet_train.log.train")
test_log = pd.read_csv("./lenet_train.log.test")
_, ax1 = subplots(figsize=(15, 10))
ax2 = ax1.twinx()
ax1.plot(train_log["NumIters"], train_log["loss"], alpha=0.4)
ax1.plot(test_log["NumIters"], test_log["loss"], 'g')
ax2.plot(test_log["NumIters"], test_log["acc"], 'r')
ax1.set_xlabel('iteration')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
savefig("./train_test_image.png") #save image as png

Hossein · Answer

Caffe creates logs each time you try to train something, and its located in the tmp folder (both linux and windows).
I also wrote a plotting script in python which you can easily use to visualize your loss/accuracy.
Just place your training logs with .log extension next to the script and double click on it. You can use command prompts as well, but for ease of use, when executed it loads all logs (*.log) it can find in the current directory. it also shows the top 4 accuracies and at-which accuracy they were achieved.

you can find it here : https://gist.github.com/Coderx7/03f46cb24dcf4127d6fa66d08126fa3b

Muhammad Farooq · Answer

python /pathtocaffe/tools/extra/parse_log.py lenet_train.log

command produces the following error:

usage: parse_log.py [-h] [--verbose] [--delimiter DELIMITER]
                logfile_path output_dir
parse_log.py: error: too few arguments

Solution:

For successful execution of "parse_log.py" command, we should pass the two arguments:

log file
path of output directory

So the correct command is as follows:

python /pathtocaffe/tools/extra/parse_log.py lenet_train.log output_dir

Monitor training/validation process in Caffe

Tags:

c++

classification

deep-learning

caffe

conv-neural-network

DucCuong

Video Answer

3 Answers

Jia Li

Hossein

Muhammad Farooq

Recent Activity

Donate For Us

Monitor training/validation process in Caffe

Tags:

c++

classification

deep-learning

caffe

conv-neural-network

DucCuong

Video Answer

3 Answers

Jia Li

Hossein

Muhammad Farooq

Related questions

Recent Activity

Donate For Us