Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about optimizer #33

Open
darktanuki opened this issue Dec 18, 2024 · 1 comment
Open

Question about optimizer #33

darktanuki opened this issue Dec 18, 2024 · 1 comment

Comments

@darktanuki
Copy link

Hello,

Reading your paper and your StableDiffusion/Hidden code, I noticed that you are using the LAMB optimizer…
Why did you choose to use LAMB instead of AdamW / RAdam?
I tried to replicate your experiment with the optimizer AdamW/ Radam with the free scheduler (https://github.com/facebookresearch/schedule_free) but is not converging as fast as your method with LAMB with the cosine scheduler.

Best regards,
Léo

@pierrefdz
Copy link
Contributor

Hello, I tried a lot of optimizers and different hparams. This was the one that gave the best result, I don't particularly know why since the optimizer was built specifically for large batch sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants