Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritized Replay Buffer for Off-policy Algorithms #158

Open
gliese876b opened this issue Jan 7, 2025 · 7 comments · May be fixed by #160
Open

Prioritized Replay Buffer for Off-policy Algorithms #158

gliese876b opened this issue Jan 7, 2025 · 7 comments · May be fixed by #160

Comments

@gliese876b
Copy link

Is there a way to use Prioritized Replay Buffer for off-policy algorithms in BenchMARL?

As far as I understand, the framework does not support it at the moment. It uses RandomSampler here;

sampler = SamplerWithoutReplacement() if self.on_policy else RandomSampler()

The documentation of the returned TensorDictReplayBuffer says;

priority_key (str, optional): the key at which priority is assumed to
be stored within TensorDicts added to this ReplayBuffer.
This is to be used when the sampler is of type
:class:~torchrl.data.PrioritizedSampler.
Defaults to "td_error".

Thus, I guess it does not use priorities while sampling. Prioritized Replay Buffer can be put in place by changing the line to;

sampler = SamplerWithoutReplacement() if self.on_policy else PrioritizedSampler(memory_size, prb_alpha, prb_beta)

When prb_alpha = 0, the priorities are not used, so the user can still use uniform sampling.

Thanks in advance!

@matteobettini
Copy link
Collaborator

Thanks for this! Yes we should allow it!

It is still more expensive to use a prioritized sampler (with priority 0) but we can add an arg where user can request it and we change the sampler type.

It would be as simple a change as you propose becuse the infrastructure to pass the priority is already all in place

self.replay_buffers[group].update_tensordict_priority(subdata)

Would you be down to try this out and let me know if you get improvements over the normal buffer?

@gliese876b
Copy link
Author

gliese876b commented Jan 9, 2025

I ran a simple test on VMAS Balance environment with IQL having PRB with 'alpha': 0.6, 'beta': 0.4.
Compared it against IQL without PRB. Here is the result (for 3 seeds each with shared areas represent min-max)

result_agents_return_balance-crop.pdf

IQL with PRB is worse in this case but I didn't run an extensive test with a hyperparameter search.
Still, it can be optional. I can make a PR on this, if you like.

@matteobettini
Copy link
Collaborator

Yes this is also similar to the results I got back in the days (benchmarl had prioritised buffers before the public release).

I wonder if there is an implementation error somewhere in the torchrl implementation.

Could you try with alpha=0 and see if the performance matches? at least this would give us a small peace of mind.

And yes a PR would be lovely thanks!

@gliese876b
Copy link
Author

Here it is compared with PRB having alpha=0.

result_return_balance-crop.pdf

As expected, with alpha=0, it reduces to random sampling.

I'll make a PR. Thanks!

@matteobettini
Copy link
Collaborator

As expected, with alpha=0, it reduces to random sampling.

Great to check! Thanks!

@matteobettini matteobettini linked a pull request Jan 10, 2025 that will close this issue
@matteobettini
Copy link
Collaborator

Here it is compared with PRB having alpha=0.

result_return_balance-crop.pdf

As expected, with alpha=0, it reduces to random sampling.

Ah, another question: in terms of time taken how do these 3 compare?

@gliese876b
Copy link
Author

I cannot say that this is a proper/fair test but here are the total time of the trials.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants