Added SACD implementation. #190

PKWadsy · 2024-08-07T04:25:04Z

Tested on Cart-Pole. Attains maximum reward

based on this paper

beardyFace · 2024-08-08T01:11:07Z

cares_reinforcement_learning/algorithm/policy/SACD.py

+        alpha_lr: float,
+        device: torch.device,
+    ):
+        self.type = "discrete_policy"


How does this get handled by gym_environment? is there a respective update? This should just stay as policy no?

The normalization in the gym environment was breaking things so wanted to make a loop without normalization. Sorry about the horrible naming

how is it breaking things? it shouldn't break anything

In the discrete setting, select_action_from_policy() and sample_action() return a tensor containing the integer index of the action to enact. However, the action returned from sample_action() is normalised, while the action returned from select_action_from_policy() is denormalised. To make this work, we would need to configure some sort of redundant reversal of normalisation/denormalisation for one of these functions so that run_action_on_emulator() receives only the normalised or original action format instead of both.

beardyFace · 2024-08-08T01:12:27Z

cares_reinforcement_learning/networks/SACD/actor.py

+    ):
+        super().__init__()
+        if hidden_size is None:
+            hidden_size = [256, 256]


double check default hidden layer size

beardyFace · 2024-08-08T01:12:38Z

cares_reinforcement_learning/networks/SACD/actor.py

+        if hidden_size is None:
+            hidden_size = [256, 256]
+        if log_std_bounds is None:
+            log_std_bounds = [-20, 2]


double check log_std_bounds default

can remove log_std_bounds not even used here

beardyFace · 2024-08-08T01:15:41Z

cares_reinforcement_learning/networks/SACD/actor.py

+        dist = torch.distributions.Categorical(action_probs)
+        action = dist.sample()
+        # Offset any values which are zero by a small amount so no nan nonsense
+        zero_offset = action_probs == 0.0


to avoid confusion just through () around the (action_probs == 0.0)

beardyFace · 2024-08-08T01:15:57Z

cares_reinforcement_learning/networks/SACD/critic.py

+    ):
+        super().__init__()
+        if hidden_size is None:
+            hidden_size = [256, 256]


double check hidden layer size defaults

Where can we get this info? The paper?

The paper or the code they provided

beardyFace · 2024-08-08T01:16:17Z

cares_reinforcement_learning/util/configurations.py

+    actor_lr: Optional[float] = 3e-4
+    critic_lr: Optional[float] = 3e-4
+    alpha_lr: Optional[float] = 3e-4
+
+    gamma: Optional[float] = 0.99
+    tau: Optional[float] = 0.005
+    reward_scale: Optional[float] = 1.0


double check defaults from paper/code

…learning into alg/SAC-D

beardyFace · 2024-08-09T04:11:10Z

cares_reinforcement_learning/algorithm/policy/SACD.py

+        alpha_lr: float,
+        device: torch.device,
+    ):
+        self.type = "discrete_policy"


how is it breaking things? it shouldn't break anything

beardyFace · 2024-08-09T04:11:25Z

cares_reinforcement_learning/networks/SACD/critic.py

+    ):
+        super().__init__()
+        if hidden_size is None:
+            hidden_size = [256, 256]


The paper or the code they provided

beardyFace · 2024-08-09T04:18:50Z

cares_reinforcement_learning/networks/SACD/actor.py

+        if hidden_size is None:
+            hidden_size = [256, 256]
+        if log_std_bounds is None:
+            log_std_bounds = [-20, 2]


can remove log_std_bounds not even used here

beardyFace · 2024-08-09T04:21:00Z

cares_reinforcement_learning/networks/SACD/actor.py

+import torch
+from torch import nn
+
+from cares_reinforcement_learning.util.common import SquashedNormal


can also remove the squashednormal import here

…rcement_learning into alg/SAC-D

PKWadsy and others added 6 commits July 16, 2024 16:18

WIP SACD

26709e4

Merge branch 'main' into SACD

ddcc677

Merge branch 'main' into alg/SAC-D

649bcaa

Fixed SACD

ab6b689

Auto-format code 🧹🌟🤖

2eb38a5

Merge branch 'main' into alg/SAC-D

2a6de92

beardyFace requested changes Aug 8, 2024

View reviewed changes

PKWadsy and others added 7 commits August 9, 2024 15:51

update target entropy default value

1861804

update SACD code repository link

9a2c11f

Add SACD to readme

0490f7b

update default hidden size and algorithm hyperparameters

096877a

Merge branch 'main' into alg/SAC-D

e30230b

Merge branch 'alg/SAC-D' of github.com:UoA-CARES/cares_reinforcement_…

8cc857b

…learning into alg/SAC-D

remove log_std_bounds variable

0d762b1

beardyFace requested changes Aug 9, 2024

View reviewed changes

PKWadsy and others added 5 commits August 9, 2024 17:15

added target_entropy_multiplier alg config

5cb0acc

Merge branch 'alg/SAC-D' of https://github.com/UoA-CARES/cares_reinfo…

10befcb

…rcement_learning into alg/SAC-D

remove unused squashed normal import

672c59f

minor naming tweaks

75a22d8

Merge branch 'main' into alg/SAC-D

48ae7b9

beardyFace approved these changes Aug 13, 2024

View reviewed changes

beardyFace merged commit 15de136 into main Aug 13, 2024
4 checks passed

beardyFace deleted the alg/SAC-D branch August 13, 2024 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added SACD implementation. #190

Added SACD implementation. #190

PKWadsy commented Aug 7, 2024

beardyFace Aug 8, 2024

PKWadsy Aug 8, 2024

beardyFace Aug 9, 2024

SamBoasman Aug 9, 2024 •

edited

Loading

beardyFace Aug 8, 2024

beardyFace Aug 8, 2024

beardyFace Aug 9, 2024

beardyFace Aug 8, 2024

beardyFace Aug 8, 2024

PKWadsy Aug 8, 2024

beardyFace Aug 9, 2024

beardyFace Aug 8, 2024

beardyFace Aug 9, 2024

beardyFace Aug 9, 2024

beardyFace Aug 9, 2024

beardyFace Aug 9, 2024

Added SACD implementation. #190

Added SACD implementation. #190

Conversation

PKWadsy commented Aug 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamBoasman Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamBoasman Aug 9, 2024 •

edited

Loading