You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for a good optimizer
According to usage
optm = AdaBound(lr=1e-03,
final_lr=0.1,
gamma=1e-03,
weight_decay=0.,
amsbound=False)
Does the learning rate gradually increase by the number of steps?
final lr is described as Final learning rate.
but it actually is leaning rate relative to base lr and current klearning rate?
Final lr is approximately after 1/ gamma update steps have occurred. At this point, the clipping bounds are somewhat tight and cause the actual lr to fall close to the final lr after clipping.
In the initial updates though, the LR bounds are in the range of the initial lr so it allows for Adam type updates.
This means that if you use this optimizer on dataset for a task that SGD can't do well on (but Adam can), then this optimizer will get worse results than Adam alone. At least that's what I've experienced on Language modelling tasks.
Thanks for a good optimizer
According to usage
optm = AdaBound(lr=1e-03,
final_lr=0.1,
gamma=1e-03,
weight_decay=0.,
amsbound=False)
Does the learning rate gradually increase by the number of steps?
final lr is described as Final learning rate.
but it actually is leaning rate relative to base lr and current klearning rate?
keras-adabound/adabound.py
Line 72 in 5ce819b
The text was updated successfully, but these errors were encountered: