-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions re: RLax Value Learning ? #9
Comments
To be clear about 3a, I specifically mean
|
And 3b, I mean
|
Yes that is correct.
Correct, rlax doesn't currently implement them.
Yes I'd welcome a PR adding these! |
@mtthss regarding the PR, what do you think of my two comments above for implementing expectile regression-naive? |
I asked one of the original authors of that paper to provide comments on the proposed implementation. |
I kept the Huber loss parameter for consistency with the quantile regression API, but the parameter is never used, so maybe it should be excluded. |
Hi Rylan and Matteo, Will Dabney and I have taken a look through expectile_naive_regression_loss and expectile_naive_q_learning, and the code looks correct to us. One comment is that expectile_naive_regression_loss could be reused as part of an ER-DQN (non-naive) implementation (in which case dist_target would be a vector of samples, rather than expectiles), so expectile_naive_regression_loss could be renamed to expectile_regression_loss, with the understanding that the elements of dist_target may be either expectiles or samples. If going down these lines, the docstring could be updated to reflect what semantics of inputs are allowed for both dist_target and dist_src. A couple of other minor comments: the jnp.abs at the end of expectile_naive_regression_loss can be removed, and the end of the docstring for expectile_naive_q_learning should update "Quantile" to "Expectile". Also as Rylan mentions, the huber_param input to both functions could also safely be removed. Thanks! |
Hi! I have several questions/requests regarding value learning https://github.com/deepmind/rlax/blob/master/rlax/_src/value_learning.py
If I want to use the
_quantile_regression_loss
without the Huber aspect, does settinghuber_param
equal to0
accomplish this? That's my understanding, but I'd like to check :)I'm interested in exploring expectile regression-naive DQN and expectile regression DQN, but code for these two related algorithms don't seem to exist. Is that correct? If code does exist, could you point me in the right direction?
If functions for expectile regression indeed do not exist, what would be the most straightforward way to implement them? If I just want expectile regression-naive, I'm thinking I would need to do the following:
a. Copy
_quantile_regression_loss()
to create_expectile_regression_loss()
, replacing the quantile loss with expectile lossb. Copy
quantile_q_learning()
to createexpectile_q_learning
, replacing the_quantile_regression_loss()
call with a_expectile_regression_loss()
callIs this correct? If so, would you be open to PRs?
The text was updated successfully, but these errors were encountered: