Prioritized Replay Buffer for Off-policy Algorithms #158

gliese876b · 2025-01-07T13:45:39Z

Is there a way to use Prioritized Replay Buffer for off-policy algorithms in BenchMARL?

As far as I understand, the framework does not support it at the moment. It uses RandomSampler here;

BenchMARL/benchmarl/algorithms/common.py

Line 161 in e910a83

sampler = SamplerWithoutReplacement() if self.on_policy else RandomSampler()

The documentation of the returned TensorDictReplayBuffer says;

priority_key (str, optional): the key at which priority is assumed to
be stored within TensorDicts added to this ReplayBuffer.
This is to be used when the sampler is of type
:class:~torchrl.data.PrioritizedSampler.
Defaults to "td_error".

Thus, I guess it does not use priorities while sampling. Prioritized Replay Buffer can be put in place by changing the line to;

sampler = SamplerWithoutReplacement() if self.on_policy else PrioritizedSampler(memory_size, prb_alpha, prb_beta)

When prb_alpha = 0, the priorities are not used, so the user can still use uniform sampling.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

matteobettini · 2025-01-07T15:41:22Z

Thanks for this! Yes we should allow it!

It is still more expensive to use a prioritized sampler (with priority 0) but we can add an arg where user can request it and we change the sampler type.

It would be as simple a change as you propose becuse the infrastructure to pass the priority is already all in place

BenchMARL/benchmarl/experiment/experiment.py

Line 795 in e910a83

self.replay_buffers[group].update_tensordict_priority(subdata)

Would you be down to try this out and let me know if you get improvements over the normal buffer?

gliese876b · 2025-01-09T09:11:49Z

I ran a simple test on VMAS Balance environment with IQL having PRB with 'alpha': 0.6, 'beta': 0.4.
Compared it against IQL without PRB. Here is the result (for 3 seeds each with shared areas represent min-max)

result_agents_return_balance-crop.pdf

IQL with PRB is worse in this case but I didn't run an extensive test with a hyperparameter search.
Still, it can be optional. I can make a PR on this, if you like.

matteobettini · 2025-01-09T09:36:57Z

Yes this is also similar to the results I got back in the days (benchmarl had prioritised buffers before the public release).

I wonder if there is an implementation error somewhere in the torchrl implementation.

Could you try with alpha=0 and see if the performance matches? at least this would give us a small peace of mind.

And yes a PR would be lovely thanks!

gliese876b · 2025-01-10T09:12:40Z

Here it is compared with PRB having alpha=0.

result_return_balance-crop.pdf

As expected, with alpha=0, it reduces to random sampling.

I'll make a PR. Thanks!

matteobettini · 2025-01-10T09:14:55Z

As expected, with alpha=0, it reduces to random sampling.

Great to check! Thanks!

matteobettini · 2025-01-10T11:12:48Z

Here it is compared with PRB having alpha=0.

result_return_balance-crop.pdf

As expected, with alpha=0, it reduces to random sampling.

Ah, another question: in terms of time taken how do these 3 compare?

gliese876b · 2025-01-10T11:21:14Z

I cannot say that this is a proper/fair test but here are the total time of the trials.

matteobettini linked a pull request Jan 10, 2025 that will close this issue

[Feature] Prioritized Replay Buffer #160

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritized Replay Buffer for Off-policy Algorithms #158

Prioritized Replay Buffer for Off-policy Algorithms #158

gliese876b commented Jan 7, 2025

matteobettini commented Jan 7, 2025

gliese876b commented Jan 9, 2025 •

edited

Loading

matteobettini commented Jan 9, 2025

gliese876b commented Jan 10, 2025

matteobettini commented Jan 10, 2025

matteobettini commented Jan 10, 2025

gliese876b commented Jan 10, 2025

Prioritized Replay Buffer for Off-policy Algorithms #158

Prioritized Replay Buffer for Off-policy Algorithms #158

Comments

gliese876b commented Jan 7, 2025

matteobettini commented Jan 7, 2025

gliese876b commented Jan 9, 2025 • edited Loading

matteobettini commented Jan 9, 2025

gliese876b commented Jan 10, 2025

matteobettini commented Jan 10, 2025

matteobettini commented Jan 10, 2025

gliese876b commented Jan 10, 2025

gliese876b commented Jan 9, 2025 •

edited

Loading