You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Spinning Up documentation for DDPG, TD3 and SAC describe the exponential moving average (EMA) of target network weights as polyak averaging. This seems to be a misnomer, as in Polyak's paper (equation 12) they use an unweighted average of all past iterates, while in EMA we have exponentially larger weights for recent iterates. The Adam paper also mentions EMA as an alternative to polyak averaging (section 7.2). Since the Spinning Up documentation is used by several students studying RL concepts, it would be good to add clarification about this naming convention.
The Spinning Up documentation for DDPG, TD3 and SAC describe the exponential moving average (EMA) of target network weights as polyak averaging. This seems to be a misnomer, as in Polyak's paper (equation 12) they use an unweighted average of all past iterates, while in EMA we have exponentially larger weights for recent iterates. The Adam paper also mentions EMA as an alternative to polyak averaging (section 7.2). Since the Spinning Up documentation is used by several students studying RL concepts, it would be good to add clarification about this naming convention.
Here is a similar discussion on a Keras issue.
The text was updated successfully, but these errors were encountered: