Documentation: Polyak vs EMA #413

dwaitbhatt · 2024-05-14T01:32:47Z

The Spinning Up documentation for DDPG, TD3 and SAC describe the exponential moving average (EMA) of target network weights as polyak averaging. This seems to be a misnomer, as in Polyak's paper (equation 12) they use an unweighted average of all past iterates, while in EMA we have exponentially larger weights for recent iterates. The Adam paper also mentions EMA as an alternative to polyak averaging (section 7.2). Since the Spinning Up documentation is used by several students studying RL concepts, it would be good to add clarification about this naming convention.

Here is a similar discussion on a Keras issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation: Polyak vs EMA #413

Documentation: Polyak vs EMA #413

dwaitbhatt commented May 14, 2024 •

edited

Loading

Documentation: Polyak vs EMA #413

Documentation: Polyak vs EMA #413

Comments

dwaitbhatt commented May 14, 2024 • edited Loading

dwaitbhatt commented May 14, 2024 •

edited

Loading