Typos in some ppo equations #14

simonclavet · 2019-10-17T20:02:27Z

I am wondering about lines 111 to 116 in ppo.cpp. Does it make sense to have mean() for surr_1 and surr_2 here? I was under the impression that surrogate losses should still be per sample at this point, and the mean() is done only after the min at line 124?

auto surr_1 = ratio * mini_batch.advantages.mean();
auto surr_2 = (torch::clamp(ratio,
1.0 - clip_param,
1.0 + clip_param) *
mini_batch.advantages)
.mean();
auto action_loss = -torch::min(surr_1, surr_2).mean();

Also, the way you compute advantages works but might be suboptimal:
advantages = (advantages - advantages.mean() / (advantages.std() + 1e-5));
should probably be
advantages = (advantages - advantages.mean()) / (advantages.std() + 1e-5);

Omegastick · 2019-10-18T16:04:23Z

The advantages one is definitely a typo, thanks for catching that.

I'm 99% sure the surrogate means one is also a typo. It seems to have been introduced in a451ad7, and I'm pretty sure it was leftover debug code that I didn't remove before commiting. I'm interested in why it works at all though. I didn't notice any regression in my game that's using the library.

While I was looking at this, I also found #15. And come to think of it the surrogate mean thing is a likely cause of that.

I'll do some tests and figure out #15 first, then if this turns out to be unrelated I'll fix it separately.

Omegastick · 2019-10-19T08:38:41Z

Fixed in 1832c60.

Looks like this was a major contributor to #15, but I feel like it's still not performing quite as well as it used to. I'll run some more detailed tests when I get time.

Omegastick added the bug Something isn't working label Oct 18, 2019

Omegastick closed this as completed Oct 19, 2019

Omegastick mentioned this issue Oct 19, 2019

Example code doesn't converge #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typos in some ppo equations #14

Typos in some ppo equations #14

simonclavet commented Oct 17, 2019

Omegastick commented Oct 18, 2019

Omegastick commented Oct 19, 2019

Typos in some ppo equations #14

Typos in some ppo equations #14

Comments

simonclavet commented Oct 17, 2019

Omegastick commented Oct 18, 2019

Omegastick commented Oct 19, 2019