Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about lr #7

Open
tanakataiki opened this issue Mar 19, 2019 · 1 comment
Open

about lr #7

tanakataiki opened this issue Mar 19, 2019 · 1 comment

Comments

@tanakataiki
Copy link

tanakataiki commented Mar 19, 2019

Thanks for a good optimizer
According to usage
optm = AdaBound(lr=1e-03,
final_lr=0.1,
gamma=1e-03,
weight_decay=0.,
amsbound=False)
Does the learning rate gradually increase by the number of steps?


final lr is described as Final learning rate.
but it actually is leaning rate relative to base lr and current klearning rate?

final_lr = self.final_lr * lr / self.base_lr

@titu1994
Copy link
Owner

Final lr is approximately after 1/ gamma update steps have occurred. At this point, the clipping bounds are somewhat tight and cause the actual lr to fall close to the final lr after clipping.

In the initial updates though, the LR bounds are in the range of the initial lr so it allows for Adam type updates.

This means that if you use this optimizer on dataset for a task that SGD can't do well on (but Adam can), then this optimizer will get worse results than Adam alone. At least that's what I've experienced on Language modelling tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants