I am finetuning
using Caffe
on an image dataset on a Tesla K40
. Using a batch size=47
, solver_type=SGD
, base_lr=0.001
, lr_policy="step"
, momentum=0.9
, gamma=0.1
, the training loss
decreases and test accuracy
goes from 2%-50%
in 100
iterations which is quite good.
When using other optimisers such as RMSPROP
, ADAM
and ADADELTA
, the training loss
remains almost the same even and no improvement in test accuracy
after 1000
iterations.
For RMSPROP
, I have changed the respective parameters as mentioned here.
For ADAM
, I have changed the respective parameters as mentioned here
For ADADELTA
, I have changed the respective parameters as mentioned here
Can someone please tell me what i am doing wrong?
I saw similar results to pir: Adam would diverge when given the same base_lr that SGD used. When I reduced base_lr to 1/100 of its original value, Adam suddenly converged, and gave good results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With