In Lesson 3 - planet, I saw these 2 lines of code:
lr = 0.01
learn.fit_one_cycle(5, slice(lr))
if the slice(min_lr, max_lr) then I understand the fit_one_cycle() will use the spread-out Learning Rates from slice(min_lr, max_lr). (Hopefully, my understanding to this is correct)
But in this case slice(lr) only has one parameter,
What are the differences between fit_one_cycle(5, lr) and fit_one_cycle(5, slice(lr)) ? And what are the benefits of using slice(lr) instead of lr directly?
The slice inside the fit_one_cycle() is used to implement discriminative learning. It basically tells the model to train the initial layers with a learning rate of 1e-5 and the final layers with a learning rate of 1e-4, and the layers in-between them with values ranging between those two learning rates.
TL;DR: fit_one_cycle() uses large, cyclical learning rates to train models significantly quicker and with higher accuracy. When training Deep Learning models with Fastai it is recommended to use the fit_one_cycle() method, due to its better performance in speed and accuracy, over the fit() method.
Jeremy took a while to explain what slice does in Lesson 5.
What I understood was that the fastai.vision module divides the architecture in 3 groups and trains them with variable learning rates depending on what you input. (Starting layers usually don't require large variations in parameters)
Additionally, if you use 'fit_one_cycle', all the groups will have learning rate annealing with their respective variable learning.
Check Lesson 5 https://course.fast.ai/videos/?lesson=5 (use the transcript finder to quickly go to the 'slice' part)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With