about lr #7

tanakataiki · 2019-03-19T04:43:58Z

Thanks for a good optimizer
According to usage
optm = AdaBound(lr=1e-03,
final_lr=0.1,
gamma=1e-03,
weight_decay=0.,
amsbound=False)
Does the learning rate gradually increase by the number of steps?

final lr is described as Final learning rate.
but it actually is leaning rate relative to base lr and current klearning rate?

keras-adabound/adabound.py

Line 72 in 5ce819b

final_lr = self.final_lr * lr / self.base_lr

titu1994 · 2019-03-19T05:16:09Z

Final lr is approximately after 1/ gamma update steps have occurred. At this point, the clipping bounds are somewhat tight and cause the actual lr to fall close to the final lr after clipping.

In the initial updates though, the LR bounds are in the range of the initial lr so it allows for Adam type updates.

This means that if you use this optimizer on dataset for a task that SGD can't do well on (but Adam can), then this optimizer will get worse results than Adam alone. At least that's what I've experienced on Language modelling tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about lr #7

about lr #7

tanakataiki commented Mar 19, 2019 •

edited

Loading

titu1994 commented Mar 19, 2019

about lr #7

about lr #7

Comments

tanakataiki commented Mar 19, 2019 • edited Loading

titu1994 commented Mar 19, 2019

tanakataiki commented Mar 19, 2019 •

edited

Loading