Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Obs and info semantics in PointMaze with continuing_task #258

Open
younik opened this issue Nov 29, 2024 · 0 comments
Open

Comments

@younik
Copy link
Member

younik commented Nov 29, 2024

Describe the bug
The observation and info returned at the last step in PointMaze with continuing_task=True, aren't updated (i.e. they contain the old goal). This is not the intended general semantics: in a common RL loop, the agent will use the old observation to predict the action to go to the old goal, instead of the new one.

See related issue: Farama-Foundation/Minari#265
See:

def step(self, action):
obs, _, _, _, info = self.point_env.step(action)
obs_dict = self._get_obs(obs)
reward = self.compute_reward(obs_dict["achieved_goal"], self.goal, info)
terminated = self.compute_terminated(obs_dict["achieved_goal"], self.goal, info)
truncated = self.compute_truncated(obs_dict["achieved_goal"], self.goal, info)
info["success"] = bool(
np.linalg.norm(obs_dict["achieved_goal"] - self.goal) <= 0.45
)
# Update the goal position if necessary
self.update_goal(obs_dict["achieved_goal"])
return obs_dict, reward, terminated, truncated, info

Code example
You need an expert policy to see this; check https://github.com/Farama-Foundation/minari-dataset-generation-scripts/blob/main/scripts/pointmaze/create_pointmaze_dataset.py

@younik younik changed the title [Bug Report] Info semantics in PointMaze with continuing_task [Bug Report] Obs and info semantics in PointMaze with continuing_task Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant