diff --git a/docs/source/index.rst b/docs/source/index.rst index 9dc78a0..63a0ac7 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -38,7 +38,7 @@ Key features ------------ * Very little background in electricity systems modelling is required. This makes :code:`gym-anm` an ideal starting point for RL students and researchers looking to enter the field. -* The environments (tasks) generated by :code:`gym-anm` follow the `OpenAI Gym `_ +* The environments (tasks) generated by :code:`gym-anm` follow the `Gymnasium `_ framework, with which a large part of the RL community is already familiar. * The flexibility of :code:`gym-anm`, with its different customizable components, makes it a suitable framework to model a wide range of ANM tasks, from simple ones that can be used for educational purposes, to complex ones diff --git a/docs/source/topics/about.rst b/docs/source/topics/about.rst index cb03175..9967fc2 100644 --- a/docs/source/topics/about.rst +++ b/docs/source/topics/about.rst @@ -4,7 +4,5 @@ About ===== The design of :code:`gym-anm` started as a summer undergraduate project conducted by -`Robin Henry `_ at the University of Liège (ULiège), Belgium, under the supervision of +`Robin Henry `_ at the University of Liège (ULiège), Belgium, under the supervision of `Prof. Damien Ernst `_ in 2019. - -It remained a side project until its first version was released in March 2021. diff --git a/docs/source/topics/archive/action_space.rst b/docs/source/topics/archive/action_space.rst deleted file mode 100644 index 072dcd1..0000000 --- a/docs/source/topics/archive/action_space.rst +++ /dev/null @@ -1,37 +0,0 @@ -.. - -.. _action_space_label: - -Action space -============ -Formally, the action vectors :math:`a_t \in \mathcal A` expected by :code:`gym-anm` environments are expressed as: - -.. math:: - \begin{align} - a_t = \big[ - \{a_{P_{g,t}} \}_{g \in \mathcal D_G - \{g^{slack}\}},\; \{a_{Q_{g,t}} \}_{g \in \mathcal D_G - \{g^{slack}\}}, - \{a_{P_{d,t}}\}_{d \in \mathcal D_{DES}},\; \{a_{Q_{d,t}}\}_{d \in \mathcal D_{DES}} \big] \;, \label{eq:action_vector} - \end{align} - -for a total of :math:`N_a = 2|\mathcal D_G| + 2|\mathcal D_{DES}| - 2` control variables to be chosen by the agent at -each timestep, each belonging to one of four categories: - -* :math:`a_{P_{g,t}}`: an upper limit on the active power injection from non-slack generator :math:`g`. - If :math:`g` is a renewable energy resource, then :math:`a_{P_{g,t}}` is the curtailment value. For classical - generators, it simply refers to a set-point chosen by the agent. The slack generator is excluded, since it is used - to balance load and generation and therefore its power injection cannot be controlled by the agent. That is, - :math:`g^{slack}` will inject into the network the amount of power needed to fill the gap between the total - generation and demand. -* :math:`a_{Q_{g,t}}`: the reactive power injection from each non-slack generator :math:`g`. - Again, the injection from the slack generator is used to balance reactive power flows and therefore cannot be - controlled by the agent. -* :math:`a_{P_{d,t}}`: the active power injection from each energy storage unit :math:`d \in \mathcal D_{DES}`. -* :math:`a_{Q_{d,t}}`: the reactive power injection from each energy storage unit :math:`d \in \mathcal D_{DES}`. - -As with all Gym environments, the action space :math:`\mathcal A` from which the agent can choose actions can be -obtained by calling :code:`env.action_space()`. - -Note that not all actions within :math:`\mathcal A` will be feasible in the current state :math:`s_t` (e.g., an empty -storage unit cannot inject power into the network). The action :math:`a_t \in \mathcal A` chosen by the agent will -first be mapped to the closest action (using Euclidean distance) in the feasible set :math:`\mathcal A(s_t)` before being -applied in the environment. diff --git a/docs/source/topics/archive/background.rst b/docs/source/topics/archive/background.rst deleted file mode 100644 index 82d2ad5..0000000 --- a/docs/source/topics/archive/background.rst +++ /dev/null @@ -1,61 +0,0 @@ -.. - -Background and notation -======================= - -Reinforcement learning ----------------------- -The documentation of :code:`gym-anm` assumes familiarity with basic reinforcement learning (RL) concepts. Some good -resources to get started are: - -* `Reinforcement Learning: An Introduction `_ -* `OpenAI Spinning Up `_ - -Being familiar with the `OpenAI Gym `_ framework is also useful, since :code:`gym-anm` -environments follow the same framework. - - -Distribution networks ---------------------- - -Notation -^^^^^^^^ -The main notations used throughout this documentation are listed below. - -* :math:`\mathbf i` - the imaginary number with :math:`\mathbf i^2 = -1`. -* :math:`G(\mathcal N, \mathcal E)` - the directed graph representing the distribution network, -* :math:`\mathcal N = \{0,1,\ldots,N-1\}` - the set of buses (or nodes) in the network, -* :math:`\mathcal E \subseteq \mathcal N \times \mathcal N` - the set of directed edges (transmission lines) linking buses together, -* :math:`e_{ij} \in \mathcal E` - the directed edge with sending bus :math:`i` and receiving bus :math:`j`, -* :math:`\mathcal D = \{0,1,\ldots,D-1\}` - the set of all electrical devices connected to the grid. Each device - :math:`d \in \mathcal D` is connected to exactly one bus and may inject or withdraw power into/from the grid. -* :math:`\mathcal D_i \subseteq \mathcal D` - the set of electrical devices connected to bus :math:`i`, -* :math:`V_i, I_i, P_i^{(bus)}, Q_i^{(bus)}` - the complex voltage level, complex total current injection, total real power injection, - and total reactive power injection at bus :math:`i`, respectively, -* :math:`P_d^{(dev)}, Q_d^{(dev)}` - the real and reactive power injections of device :math:`d \in \mathcal D` into the grid, - respectively, -* :math:`I_{ij}, P_{ij}, Q_{ij}, S_{ij}` - the complex current, active power flow, reactive power flow, and complex - power flow in branch :math:`e_{ij} \in \mathcal E`, from bus :math:`i` to bus :math:`j`, respectively, with - :math:`S_{ij} = P_{ij} + \mathbf i Q_{ij}`. -* :math:`\mathcal D_L \subset \mathcal D` - the set of passive load devices that only withdraw power from the grid, -* :math:`\mathcal D_G \subset \mathcal D` - the set of generators, which only inject power into the grid, with the - exception the slack device (see below), -* :math:`\mathcal D_{DES} \subset \mathcal D` - the set of distributed energy storage (DES) units, which can both - inject and withdraw power into/from the grid, -* :math:`\mathcal D_{DER} \subset \mathcal D_G` - the set of renewable energy resources (a subset of all generators), -* :math:`g^{slack} \in \mathcal D_G - \mathcal D_{DER}` - the slack device, a generator used to balance power flow in - the network and provide a voltage reference. The slack device is the only device connected to the slack bus :math:`i=0`, -* :math:`SoC_d` - the state of charge (i.e., energy level) of storage unit :math:`d \in \mathcal D_{DES}`, -* :math:`P_g^{(max)}` - the maximum real power that generator :math:`g \in \mathcal D_G - \{g^{slack}\}` can produce if - not curtailed, - -Basic concepts and assumptions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The slack bus is assumed unique and at :math:`i=0`, with a fixed voltage reference :math:`V_0 = 1 \angle 0`. - -Unless otherwise stated, all electrical quantities are expressed in `per unit (p.u.) `_. - -The power grid is assumed to be a `three-phase balanced system `_ and we -adopt its single-phase equivalent representation in all derivations. - -For a more in-depth description of the power grid model used in :code:`gym-anm`, see the `original paper ADD LINK`_. diff --git a/docs/source/topics/archive/rewards.rst b/docs/source/topics/archive/rewards.rst deleted file mode 100644 index 9a0271e..0000000 --- a/docs/source/topics/archive/rewards.rst +++ /dev/null @@ -1,65 +0,0 @@ -.. - -.. _rewards_label: - -Rewards -======== -As described in the :ref:`task_overview_label`, the reward signal is computed as: - -.. math:: - \begin{align} - r_t = - \begin{cases} - -(\Delta E_{t:t+1} + \lambda \phi(s_{t+1})), & \text{if } s_{t+1} \notin \mathcal S^{terminal}, \\ - - \frac{r^{max}}{1 - \gamma}, & \text{if } s \notin \mathcal S^{terminal} \text{ and } s_{t+1} \in \mathcal S^{terminal}, \\ - 0, & \text{else.} - \end{cases} - \end{align} - - -Energy loss ------------ -The energy loss :math:`\Delta E_{t:t+1}` is computed in three parts: - -.. math:: - \begin{align} - \Delta E_{t:t+1} = \Delta E_{t:t+1}^{(1)} + \Delta E_{t:t+1}^{(2)} + \Delta E_{t:t+1}^{(3)} \;, - \end{align} - -where: - -* :math:`\Delta E_{t:t+1}^{(1)}` is the total transmission energy loss during :math:`(t, t+1]`, a result of leakage in - transmission lines and transformers. -* :math:`\Delta E_{t:t+1}^{(2)}` is the total net amount of energy flowing from the grid into DES units during - :math:`(t, t+1]`. Over a sufficiently large number of timesteps, the sum of these terms will approximate the amount - of energy lost due to leakage in DES units. -* :math:`\Delta E_{t:t+1}^{(3)}` is the total amount of energy loss as a result of renewable generation curtailment of - generators during :math:`(t, t+1]`. Depending on the regulation, this can be thought of as a fee paid by the DNO to - the owners of the generators that get curtailed, as financial compensation. - - -Network constraint violation ----------------------------- -In the penalty term :math:`\phi(s_{t+1})`, we consider two types of network-wide operating constraints: branch current -limits and voltage constraints (see :ref:`task_overview_label`). - -Formally, :math:`\phi(s_{t+1})` is expressed as: - -.. math:: - \begin{align} - \Phi(\mathbf s_{t+1}) = \Delta t \Big(&\sum_{i \in \mathcal N} \big(\max{(0, |V_{i,t+1}| - \overline V_i)} + \max{(0, \underline V_i - |V_{i,t+1}|)} \big) \nonumber \\ - &+ \sum_{e_{ij} \in \mathcal E} \max{(0, |S_{ij,t+1}| - \overline S_{ij}, |S_{ji,t+1}| - \overline S_{ij})} \Big) \;. - \end{align} - -where - -* :math:`|V_{i,t+1}|` is the voltage magnitude at bus :math:`i` at time :math:`t+1` (in p.u.), -* :math:`[\underline V_i, \overline V_i]` is the range of allowed voltage magnitude at bus :math:`i` (in p.u.), -* :math:`|S_{ij,t+1}|` is the apparent power flow in branch :math:`e_{ij}` linking buses :math:`i` and :math:`j` at time - :math:`t+1`, -* :math:`\overline S_{ij}` is the rated (i.e., maximum) apparent power flow of branch :math:`e_{ij}`. - -In practice, violating any network constraint can lead to damaging parts of the DN infrastructure (e.g., lines or -transformers) or power outages, which can both have important economical consequences for the DNO. For that reason, -ensuring that the DN operates within its constraints is often prioritized compared to minimizing energy loss. This can -be achieved by choosing a large :math:`\lambda` or by setting an over-restrictive set of constraints in the environment. diff --git a/docs/source/topics/archive/state_space.rst b/docs/source/topics/archive/state_space.rst deleted file mode 100644 index 36a2e59..0000000 --- a/docs/source/topics/archive/state_space.rst +++ /dev/null @@ -1,27 +0,0 @@ -.. - -.. _state_space_label: - -State space -=========== - -Formally, the state vectors used by :code:`gym-anm` are expressed as follows: - -.. math:: - \begin{align} - s_t = \big[ - \{P_{d,t}^{(dev)}\}_{d \in \mathcal D},\; \{Q_{d,t}^{(dev)}\}_{d \in \mathcal D},\; \{SoC_{d,t}\}_{d \in \mathcal D_{DES}}, - \{P_{g,t}^{(max)}\}_{g \in \mathcal D_G - \{g^{slack}\}},\; \{aux^{(k)}_t\}_{k =0}^{K-1} \big] \;, \label{eq:state} - \end{align} - -where: - -* :math:`P_{d,t}^{(dev)}` and :math:`Q_{d,t}^{(dev)}` are the real and reactive power injections into the grid from - electrical device :math:`d \in \mathcal D` at time :math:`t`, respectively, -* :math:`SoC_{d,t}` is the charge level, or state of charge (SoC), of storage unit :math:`d \in \mathcal D_{DES}`, -* :math:`P_{g,t}^{(max)}` is the maximum production that generator :math:`g \in \mathcal D_G - \{g^{slack}\}` can - produce at time :math:`t`, -* :math:`aux_t^{(k)}` is the value of the :math:`(k-1)^{th}` auxiliary variable generated during the transition from - timestep :math:`t` to timestep :math:`t+1`. - -Terminal states :math:`s \in \mathcal S^{terminal}` are represented by the zero vector :math:`[0,\ldots,0]`. \ No newline at end of file diff --git a/docs/source/topics/archive/task_overview.rst b/docs/source/topics/archive/task_overview.rst deleted file mode 100644 index def0c1b..0000000 --- a/docs/source/topics/archive/task_overview.rst +++ /dev/null @@ -1,132 +0,0 @@ -.. - -.. _task_overview_label: - -Task overview -============= - -Each :code:`gym-anm` task can be described as a `Markov Decision Process `_ (MDP), -of which an overview is provided below. - -Goal ----- -When tackling tasks modelled by :code:`gym-anm`, the goal is to minimize the cost of operating the distribution network -while avoiding the violation of grid constraints. In doing so, the agent takes the place of the Distribution Network -Operator (DNO). - -Because real-world operating costs come from a wide range of sources (e.g., electricity market price, equipment -maintenance, etc.), the true operating cost must be approximated to be practical. In :code:`gym-anm`, the operating -cost is assumed to be fully described by a combination of: - -1. *Energy losses*: from cable transmission (dissipated as heat) and renewable energy curtailment (clamping the output - of a generator that could otherwise have produced more). -2. *Network constraint violation*: violating network constraints (e.g., transmission line current constraints) may lead - to damaging parts of the power grid. In general, these costs are more important than energy losses. Note that, in - the worst case, failing to satisfy network constraints can lead to some form of network collapse (e.g., blackout). - -In addition, we restrict network constraints to two types of constraints. These constraints alone already represent -most practical limits DNOs face in the real-world management of distribution networks: - -1. *Voltage constraints*: the voltage of each node of the network must remain within a specified range (e.g., [0.95, 1.05] pu). - This is required to ensure the power grid remains stable and that all devices connected to it can operate properly. -2. *Branch current constraints*: transmission line currents must remain below certain pre-specified value (known as the - *rated* value) to prevent lines and transformers from overheating. - - -Reward signal -------------- -In order to drive the behavior of RL agents towards the goal mentioned above, :code:`gym-anm` uses a reward function -that directly incorporates the quantities to be minimized: energy losses and network constraint violations. - -The reward signal :math:`r_t = r(s_t, a_t, s_{t+1})` is computed as: - -.. math:: - \begin{align} - r_t = - \begin{cases} - -(\Delta E_{t:t+1} + \lambda \phi(s_{t+1})), & \text{if } s_{t+1} \notin \mathcal S^{terminal}, \\ - - \frac{r^{max}}{1 - \gamma}, & \text{if } s \notin \mathcal S^{terminal} \text{ and } s_{t+1} \in \mathcal S^{terminal}, \\ - 0, & \text{else,} - \end{cases} - \end{align} - -where: - -* :math:`\Delta E_{t:t+1}` is the total network-wide energy loss during :math:`(t,t+1]`, -* :math:`\phi(s_{t+1})` is a penalty term associated with the violation of operating constraints, -* :math:`\lambda` is a weighting hyperparameter, -* :math:`r^{max}` is an upper bound on the rewards emitted by the environment, often chosen around 100, and -* :math:`\gamma \in [0, 1]` is the discount factor. - -During the transition from a nonterminal state to a terminal one (i.e., when the network collapses), the environment -emits a large negative reward and subsequent rewards are always zero, until a new trajectory is started by sampling a -new initial state :math:`s_0`. - -For more information about the rewards, see :ref:`rewards_label`. - - -Action vectors --------------- -At each timestep, the agent must choose a set of actions to perform in the environment. These correspond to the -management strategy employed by the DNO. - -The actions are collected into an action vector :math:`a_t \in \mathcal A`. In all :code:`gym-anm` tasks, each action -vector contains 4 types of decision variables: - -* An upper limit on the active power that each generator can produce. In the case of renewable energy resources, this - corresponds to the curtailment value. For classical generators, it corresponds to a set-point specified by the DNO. -* A set-point for the reactive power generation of each generator (renewable or not). -* A set-point for the active power injection from each energy storage unit. -* A set-point for the reactive power injection from each energy storage unit. - -The resulting action space is usually constrained. Whenever the agent selects an action that falls outside of the allowed -action space, the selected action is first mapped to the physically-possible action before being applied in the environment. - -For more information about action vectors, see :ref:`action_space_label`. - - -State vectors -------------- -Because :code:`gym-anm` tasks are modelled as MDPs, environments can always be described by their current Markovian -state, which we denote :math:`s_t \in \mathcal S`. - -In all :code:`gym-anm` environments, state vectors contain the following information: - -* The current (instantaneous) amount of power injected into (or withdrawn from) the power grid by each electrical device connected to it. -* The current SoC of all energy storage units (e.g., batteries). -* The maximum (theoretical) generation that each renewable energy resource could have produced if not curtailed, given - the current environmental conditions. -* Any additional variables required to make the task Markovian (i.e., ensure that :math:`s_{t+1}` can be expressed - probabilistically given :math:`s_t` and :math:`a_t`). We refer to these as *auxiliary variables*. - -The environment may also end up in a terminal state :math:`s \in \mathcal S^{terminal} \subset \mathcal S`, which marks -the end of the episode. Reaching a terminal state indicates that the power grid has collapsed, often due to a `voltage -collapse problem `_. The environment will remain in a -terminal state until it is reset. - -For more information about state vectors, see :ref:`state_space_label`. - - -State transitions ------------------ -Each state transition from :math:`s_t` to :math:`s_{t+1}` are fully handled by the environment. They occur in three steps: - -1. A new outcome for the stochastic processes modelled by the environment is sampled. These include (a) the demand from - each load device, (b) the maximum generation from each generator, and (c) the auxiliary variables. -2. Once the action :math:`a_t \in \mathcal A` has been selected by the agent, the action vector is mapped onto the set - of physically possible actions :math:`A(s_t)`. -3. The mapped actions are then applied in the environment and the new electrical quantities are computed, resulting in - a new state :math:`s_{t+1}`, observation :math:`o_{t+1}`, and reward :math:`r_t`. - -.. For more information about state transitions, see :ref:`transition_label`. - -Observation vectors -------------------- -In general, DNOs rarely have access to the full state of the distribution network when doing ANM. - -One of the key characteristics of :code:`gym-anm` is that new environments built using this framework allows users to -easily define their own observation vectors. This means that the same task can be rendered more or less difficult by -simply modifying the observation space, thus restricting the amount (or quality) of the information the agent has access to. - -To simplify the design of customized observation spaces, :code:`gym-anm` allows users to simply specify a set of -variables to include in the observation vectors. For more information on designing new environments, see :ref:`framework_label`. diff --git a/docs/source/topics/design_new_env.rst b/docs/source/topics/design_new_env.rst index c31af2b..94b4e03 100644 --- a/docs/source/topics/design_new_env.rst +++ b/docs/source/topics/design_new_env.rst @@ -45,7 +45,7 @@ where: way to infer the bounds of observation vectors :math:`o_t` and :code:`observation_bounds()` can be used to specify them. * :code:`render()` and :code:`close()` are optional methods that can be implemented to support rendering - of the environment. For more information, see the official `Gym `_ documentation. + of the environment. For more information, see the official `Gymnasium `_ documentation. Example diff --git a/docs/source/topics/quickstart.rst b/docs/source/topics/quickstart.rst index 216dec3..85b9d10 100644 --- a/docs/source/topics/quickstart.rst +++ b/docs/source/topics/quickstart.rst @@ -16,11 +16,11 @@ are randomly sampled from the action space at each time step: :: def run(): env = gym.make('gym_anm:ANM6Easy-v0') - o = env.reset() + o, _ = env.reset() for i in range(100): a = env.action_space.sample() - o, r, done, info = env.step(a) + o, r, terminated, _, _ = env.step(a) env.render() time.sleep(0.5) # otherwise the rendering is too fast for the human eye. @@ -29,7 +29,7 @@ are randomly sampled from the action space at each time step: :: if __name__ == '__main__': run() -For more information about the Gym interface, read the `official documentation `_. +For more information about the Gymnasium interface, read the `official documentation `_. Designing your own ANM task diff --git a/docs/source/topics/using_env.rst b/docs/source/topics/using_env.rst index 6fa0c8c..945b300 100644 --- a/docs/source/topics/using_env.rst +++ b/docs/source/topics/using_env.rst @@ -5,8 +5,7 @@ Using an Environment Initializing ------------- -If the :code:`gym-anm` environment you would like to use has already been registered in the :code:`gym`'s registry -(see the `Gym documentation `_), you can initialize it with +If the :code:`gym-anm` environment you would like to use has already been registered in the :code:`gymnasium`'s registry, you can initialize it with :code:`gym.make('gym_anm:')`, where :code:`` it the ID of the environment. For example: :: import gymnasium as gym @@ -21,22 +20,22 @@ Alternatively, the environment can be initialized directly from its class: :: Agent-environment interactions ------------------------------ -Built on top of `Gym `_, :code:`gym-anm` provides 2 core functions: :code:`reset()` and +Built on top of `Gymnasium `_, :code:`gym-anm` provides 2 core functions: :code:`reset()` and :code:`step(a)`. :code:`reset()` can be used to reset the environment and collect the first observation of the trajectory: :: - obs = env.reset() + obs, _ = env.reset() After the agent has selected an action :code:`a` to apply to the environment, :code:`step(a)` can be used to do so: :: - obs, r, done, info = env.step(a) + obs, r, terminated, _, info = env.step(a) where: * :code:`obs` is the vector of observations :math:`o_{t+1}`, * :code:`r` is the reward :math:`r_t`, -* :code:`done` is a boolean value set to :code:`true` if :math:`s_{t+1}` is a terminal state, +* :code:`terminated` is a boolean value set to :code:`true` if :math:`s_{t+1}` is a terminal state, * :code:`info` gathers information about the transition (it is seldom used in :code:`gym-anm`). Render the environment @@ -58,16 +57,16 @@ Complete example A complete example of agent-environment interactions with an arbitrary agent :code:`agent`: :: env = gym.make('gym_anm:ANM6Easy-v0') - o = env.reset() + o, _ = env.reset() for i in range(1000): a = agent.act(o) - o, r, done, info = env.step(a) + o, r, terminated, _, info = env.step(a) env.render() time.sleep(0.5) # otherwise the rendering is too fast for the human eye if done: - o = env.reset() + o, _ = env.reset() The above example would be rendered in your favorite web browser as: