Question about optimizer #33

darktanuki · 2024-12-18T14:13:54Z

Hello,

Reading your paper and your StableDiffusion/Hidden code, I noticed that you are using the LAMB optimizer…
Why did you choose to use LAMB instead of AdamW / RAdam?
I tried to replicate your experiment with the optimizer AdamW/ Radam with the free scheduler (https://github.com/facebookresearch/schedule_free) but is not converging as fast as your method with LAMB with the cosine scheduler.

Best regards,
Léo

pierrefdz · 2025-01-06T17:28:38Z

Hello, I tried a lot of optimizers and different hparams. This was the one that gave the best result, I don't particularly know why since the optimizer was built specifically for large batch sizes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about optimizer #33

Question about optimizer #33

darktanuki commented Dec 18, 2024

pierrefdz commented Jan 6, 2025

Question about optimizer #33

Question about optimizer #33

Comments

darktanuki commented Dec 18, 2024

pierrefdz commented Jan 6, 2025