I'm training Caffe Reference Model for classifying images. My work requires me to monitor the training process by drawing graph of accuracy of the model after every 1000 iterations on entire training set and validation set which has 100K and 50K images respectively. Right now, Im taking the naive approach, make snapshots after every 1000 iterations, run the C++ classififcation code which reads raw JPEG image and forward to the net and output the predicted labels. However, this takes too much time on my machine (with a Geforce GTX 560 Ti)
Is there any faster way that I can do to have the graph of accuracy of the snapshot models on both training and validation sets?
I was thinking about using LMDB format instead of raw images. However, I cannot find documentation/code about doing classification in C++ using LMDB format.
1) You can use the NVIDIA-DIGITS app to monitor your networks. They provide a GUI including dataset preparation, model selection, and learning curve visualization. More, they use a caffe distribution allowing multi-GPU training.
2) Or, you can simply use the log-parser inside caffe.
/pathtocaffe/build/tools/caffe train --solver=solver.prototxt 2>&1 | tee lenet_train.log
This allows you to save train log into "lenet_train.log". Then by using:
python /pathtocaffe/tools/extra/parse_log.py lenet_train.log .
you parse your train log into two csv files, containing train and test loss. You can then plot them using the following python script
import pandas as pd
from matplotlib import *
from matplotlib.pyplot import *
train_log = pd.read_csv("./lenet_train.log.train")
test_log = pd.read_csv("./lenet_train.log.test")
_, ax1 = subplots(figsize=(15, 10))
ax2 = ax1.twinx()
ax1.plot(train_log["NumIters"], train_log["loss"], alpha=0.4)
ax1.plot(test_log["NumIters"], test_log["loss"], 'g')
ax2.plot(test_log["NumIters"], test_log["acc"], 'r')
ax1.set_xlabel('iteration')
ax1.set_ylabel('train loss')
ax2.set_ylabel('test accuracy')
savefig("./train_test_image.png") #save image as png
Caffe creates logs each time you try to train something, and its located in the tmp folder (both linux and windows).
I also wrote a plotting script in python which you can easily use to visualize your loss/accuracy.
Just place your training logs with .log
extension next to the script and double click on it.
You can use command prompts as well, but for ease of use, when executed it loads all logs (*.log) it can find in the current directory.
it also shows the top 4 accuracies and at-which accuracy they were achieved.
you can find it here : https://gist.github.com/Coderx7/03f46cb24dcf4127d6fa66d08126fa3b
python /pathtocaffe/tools/extra/parse_log.py lenet_train.log
command produces the following error:
usage: parse_log.py [-h] [--verbose] [--delimiter DELIMITER]
logfile_path output_dir
parse_log.py: error: too few arguments
Solution:
For successful execution of "parse_log.py" command, we should pass the two arguments:
So the correct command is as follows:
python /pathtocaffe/tools/extra/parse_log.py lenet_train.log output_dir
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With