Skip to content
This repository has been archived by the owner on Dec 28, 2023. It is now read-only.

Typos in some ppo equations #14

Closed
simonclavet opened this issue Oct 17, 2019 · 2 comments
Closed

Typos in some ppo equations #14

simonclavet opened this issue Oct 17, 2019 · 2 comments
Labels
bug Something isn't working

Comments

@simonclavet
Copy link

I am wondering about lines 111 to 116 in ppo.cpp. Does it make sense to have mean() for surr_1 and surr_2 here? I was under the impression that surrogate losses should still be per sample at this point, and the mean() is done only after the min at line 124?

auto surr_1 = ratio * mini_batch.advantages.mean();
auto surr_2 = (torch::clamp(ratio,
1.0 - clip_param,
1.0 + clip_param) *
mini_batch.advantages)
.mean();
auto action_loss = -torch::min(surr_1, surr_2).mean();

Also, the way you compute advantages works but might be suboptimal:
advantages = (advantages - advantages.mean() / (advantages.std() + 1e-5));
should probably be
advantages = (advantages - advantages.mean()) / (advantages.std() + 1e-5);

@Omegastick Omegastick added the bug Something isn't working label Oct 18, 2019
@Omegastick
Copy link
Owner

The advantages one is definitely a typo, thanks for catching that.

I'm 99% sure the surrogate means one is also a typo. It seems to have been introduced in a451ad7, and I'm pretty sure it was leftover debug code that I didn't remove before commiting. I'm interested in why it works at all though. I didn't notice any regression in my game that's using the library.

While I was looking at this, I also found #15. And come to think of it the surrogate mean thing is a likely cause of that.

I'll do some tests and figure out #15 first, then if this turns out to be unrelated I'll fix it separately.

@Omegastick
Copy link
Owner

Fixed in 1832c60.

Looks like this was a major contributor to #15, but I feel like it's still not performing quite as well as it used to. I'll run some more detailed tests when I get time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants