Training Tensorflow Inception-v3 Imagenet on modest hardware setup

Tags:

I've been training Inception V3 on a modest machine with a single GPU (GeForce GTX 980 Ti, 6GB). The maximum batch size appears to be around 40.

I've used the default learning rate settings specified in the inception_train.py file: initial_learning_rate = 0.1, num_epochs_per_decay = 30 and learning_rate_decay_factor = 0.16. After a couple of weeks of training the best accuracy I was able to achieve is as follows (About 500K-1M iterations):

2016-06-06 12:07:52.245005: precision @ 1 = 0.5767 recall @ 5 = 0.8143 [50016 examples]
2016-06-09 22:35:10.118852: precision @ 1 = 0.5957 recall @ 5 = 0.8294 [50016 examples]
2016-06-14 15:30:59.532629: precision @ 1 = 0.6112 recall @ 5 = 0.8396 [50016 examples]
2016-06-20 13:57:14.025797: precision @ 1 = 0.6136 recall @ 5 = 0.8423 [50016 examples]

I've tried fiddling with the settings towards the end of the training session, but couldn't see any improvements in accuracy.

I've started a new training session from scratch with num_epochs_per_decay = 10 and learning_rate_decay_factor = 0.001 based on some other posts in this forum, but it's sort of grasping in the dark here.

Any recommendations on good defaults for a small hardware setup like mine?

397

asked Jul 08 '16 04:07

Dominiek

2 Answers

TL,DR: There is no known method for training an Inception V3 model from scratch in a tolerable amount of time from a modest hardware set up. I would strongly suggest retraining a pre-trained model on your desired task.

On a small hardware set up like yours, it will be difficult to achieve maximum performance. Generally speaking for CNN's, the best performance is with the largest batch sizes possible. This means that for CNN's the training procedure is often limited by the maximum batch size that can fit in GPU memory.

The Inception V3 model available for download here was trained with an effective batch size of 1600 across 50 GPU's -- where each GPU ran a batch size of 32.

Given your modest hardware, my number one suggestion would be to download the pre-trained mode from the link above and retrain the model for the individual task you have at hand. This would make your life much happier.

As a thought experiment (but hardly practical) .. if you feel especially compelled to exactly match the training performance of the model from the pre-trained model by training from scratch, you could do the following insane procedure on your 1 GPU. Namely, you could run the following procedure:

Run with a batch size of 32
Store the gradients from the run
Repeat this 50 times.
Average the gradients from the 50 batches.
Update all variables with the gradients.
Repeat

I am only mentioning this to give you a conceptual sense of what would need to be accomplished to achieve the exact same performance. Given the speed numbers you mentioned, this procedure would take months to run. Hardly practical.

More realistically, if you are still strongly interested in training from scratch and doing the best you can, here are some general guidelines:

Always run with the largest batch size possible. It looks like you are already doing that. Great.
Make sure that you are not CPU bound. That is, make sure that the input processing queue's are always modestly full as displayed on TensorBoard. If not, increase the number of preprocessing threads or use a different CPU if available.
Re: learning rate. If you are always running synchronous training (which must be the case if you only have 1 GPU), then the higher batch size, the higher the tolerable learning rate. I would a try a series of several quick runs (e.g. a few hours each) to identify the highest learning possible which does not lead to NaN's. After you find such a learning rate, knock it down by say 5-10% and run with that.
As for num_epochs_per_decay and decay_rate, there are several strategies. The strategy highlighted by 10 epochs per decay, 0.001 decay factor is to hammer the model for as long as possible until the eval accuracy asymptotes. And then lower the learning rate. This is a simple strategy which is nice. I would verify that is what you see in your model monitoring that the eval accuracy and determining that it indeed asymptotes before you allow the model to decay the learning rate. Finally, the decay factor is a bit ad-hoc but lowering by say a power of 10 seems to be a good rule of thumb.

Note again that these are general guidelines and others might even offer differing advice. The reason why we can not give you more specific guidance is that CNNs of this size are just not often trained from scratch on a modest hardware setup.

answered Sep 18 '22 22:09

user5869947

Excellent tips. There is precedence for training using a similar setup as yours. Check this out - http://3dvision.princeton.edu/pvt/GoogLeNet/ These people trained GoogleNet, but, using Caffe. Still, studying their experience would be useful.

answered Sep 20 '22 22:09

PintoUbuntu

Related questions
                            
                                Automatically adjusting brightness of image with OpenCV
                            
                                Eye tracking: finding the pupil (x,y)
                            
                                Relation between horizontal, vertical and diagonal Field-of-View
                            
                                Is One-Hot Encoding required for using PyTorch's Cross Entropy Loss Function?
                            
                                Triangulation & Direct linear transform
                            
                                Principal Component Analysis in MATLAB
                            
                                How to apply the camera pose transformation computed using EPnP to the VTK camera?
                            
                                How to detect ellipses in image without using fitEllipse() in opencv?
                            
                                How to obtain the right alpha value to perfectly blend two images?
                            
                                Reconstruct scene like Photosynth
                            
                                The best way to calculate the best threshold with P. Viola, M. Jones Framework
                            
                                Real-time template matching - OpenCV, C++
                            
                                Advanced square detection (with connected region)
                            
                                In the Circle Hough Transform, what is the Inverse Ratio of Accumulator Resolution (dp) and how does it affect circle detection?
                            
                                Calculating aspect ratio of Perspective Transform destination image
                            
                                Opencv getGaborKernel parameters for filter bank
                            
                                Plane fitting in a 3d point cloud
                            
                                Opencv: AttributeError: module 'cv2' has no attribute 'dnn'
                            
                                Copying Region of an Image to another Region in another Image
                            
                                Remove background text and noise from an image using image processing with OpenCV

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Training Tensorflow Inception-v3 Imagenet on modest hardware setup

Tags:

tensorflow

deep-learning

computer-vision

imagenet

Dominiek

People also ask

2 Answers

user5869947

PintoUbuntu

Recent Activity

Donate For Us