Just curious, on how long will it take to train the VGG16 model on IMAGENET using GOOGLE COLAB TPU? If someone can explain me the calcuations they did to get to the answer, that would be great!

Here is the official TPU example. Training <code>VGG-16</code> on optimized tfrecord dataset with 2990 train images, <code>IMAGE_SIZE = [331, 331], batch_size=128, 12 epochs</code> takes 2m15sec. I think training with <code>1,281,167 ImageNet images</code> will takes <code>approximately 15 hours</code>.

How long will it take to train the VGG-16 model on IMAGENET using GOOGLE COLAB TPU?

2 Answers

It's very hard to accurately estimate the how long it'll take to train a model e2e. But assuming you're just looking for a very rough estimate, we can start off by noting that this ResNet50 implementation we have (code) runs to convergence (76%+ top1 accuracy trained on 90 epochs) in roughly 7.3 hours on a v2-8 TPU device. Given that VGG16 are close enough in step time (https://github.com/jcjohnson/cnn-benchmarks#cnn-benchmarks) I'd expect convergence for it to also be proportional to that. However, disclaimer that this is a very rough estimation and actual performance would also depend on how optimized the implementation is.

185

answered Oct 16 '22 18:10

jysohn

Here is the official TPU example. Training VGG-16 on optimized tfrecord dataset with 2990 train images, IMAGE_SIZE = [331, 331], batch_size=128, 12 epochs takes 2m15sec. I think training with 1,281,167 ImageNet images will takes approximately 15 hours.

answered Oct 16 '22 19:10

balezz

Related questions
                            
                                Benchmark of HowTo: Reading Data
                            
                                How to speedup rnn training speed of tensorflow?
                            
                                What is the difference between tf.train.MonitoredTrainingSession and tf.train.Supervisor
                            
                                TensorFlow RNNs for named entity recognition
                            
                                How to feed back RNN output to input in tensorflow
                            
                                Many to many sequence prediction with different sequence length
                            
                                Keras/TF: Time Distributed CNN+LSTM for visual recognition
                            
                                Why the tf.name_scope with same name is different?
                            
                                How to explain the result of tf.map_fn?
                            
                                Keras images with no subfolders
                            
                                AttributeError: module 'tensorflow' has no attribute 'feature_column'
                            
                                Using Keras, How can I load weights generated from CuDNNLSTM into LSTM Model?
                            
                                How is Nesterov's Accelerated Gradient Descent implemented in Tensorflow?
                            
                                Why can tf.image.decode_jpeg decode a png?
                            
                                Keras network producing inverse predictions
                            
                                Doing Multi-Label classification with BERT
                            
                                Categorical focal loss on keras
                            
                                Proper way to iterate tf.data.Dataset in session for 2.0
                            
                                How to shuffle two numpy datasets using TensorFlow 2.0?
                            
                                How to get value of a Keras tensor in TensorFlow 2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How long will it take to train the VGG-16 model on IMAGENET using GOOGLE COLAB TPU?

Tags:

tensorflow

deep-learning

google-colaboratory

google-cloud-tpu

tpu

Umair Javaid

People also ask

2 Answers

jysohn

balezz

Recent Activity

Donate For Us