diff --git a/docs/source/index.rst b/docs/source/index.rst
index 9dc78a0..63a0ac7 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -38,7 +38,7 @@ Key features
 ------------
 *  Very little background in electricity systems modelling is required. This makes :code:`gym-anm` an ideal starting point
    for RL students and researchers looking to enter the field.
-*  The environments (tasks) generated by :code:`gym-anm` follow the `OpenAI Gym <https://github.com/openai/gym>`_
+*  The environments (tasks) generated by :code:`gym-anm` follow the `Gymnasium <https://gymnasium.farama.org/>`_
    framework, with which a large part of the RL community is already familiar.
 *  The flexibility of :code:`gym-anm`, with its different customizable components, makes it a suitable framework
    to model a wide range of ANM tasks, from simple ones that can be used for educational purposes, to complex ones
diff --git a/docs/source/topics/about.rst b/docs/source/topics/about.rst
index cb03175..9967fc2 100644
--- a/docs/source/topics/about.rst
+++ b/docs/source/topics/about.rst
@@ -4,7 +4,5 @@ About
 =====
 
 The design of :code:`gym-anm` started as a summer undergraduate project conducted by
-`Robin Henry <https://www.robinxhenry.com/>`_ at the University of Liège (ULiège), Belgium, under the supervision of
+`Robin Henry <https://www.linkedin.com/in/rhenry012/>`_ at the University of Liège (ULiège), Belgium, under the supervision of
 `Prof. Damien Ernst <http://blogs.ulg.ac.be/damien-ernst/>`_ in 2019.
-
-It remained a side project until its first version was released in March 2021.
diff --git a/docs/source/topics/archive/action_space.rst b/docs/source/topics/archive/action_space.rst
deleted file mode 100644
index 072dcd1..0000000
--- a/docs/source/topics/archive/action_space.rst
+++ /dev/null
@@ -1,37 +0,0 @@
-..
-
-.. _action_space_label:
-
-Action space
-============
-Formally, the action vectors :math:`a_t \in \mathcal A` expected by :code:`gym-anm` environments are expressed as:
-
-.. math::
-    \begin{align}
-        a_t = \big[
-        \{a_{P_{g,t}} \}_{g \in \mathcal D_G - \{g^{slack}\}},\; \{a_{Q_{g,t}} \}_{g \in \mathcal D_G - \{g^{slack}\}},
-        \{a_{P_{d,t}}\}_{d \in \mathcal D_{DES}},\; \{a_{Q_{d,t}}\}_{d \in \mathcal D_{DES}} \big] \;, \label{eq:action_vector}
-    \end{align}
-
-for a total of :math:`N_a = 2|\mathcal D_G| + 2|\mathcal D_{DES}| - 2` control variables to be chosen by the agent at
-each timestep, each belonging to one of four categories:
-
-* :math:`a_{P_{g,t}}`: an upper limit on the active power injection from non-slack generator :math:`g`.
-  If :math:`g` is a renewable energy resource, then :math:`a_{P_{g,t}}` is the curtailment value. For classical
-  generators, it simply refers to a set-point chosen by the agent. The slack generator is excluded, since it is used
-  to balance load and generation and therefore its power injection cannot be controlled by the agent. That is,
-  :math:`g^{slack}` will inject into the network the amount of power needed to fill the gap between the total
-  generation and demand.
-* :math:`a_{Q_{g,t}}`: the reactive power injection from each non-slack generator :math:`g`.
-  Again, the injection from the slack generator is used to balance reactive power flows and therefore cannot be
-  controlled by the agent.
-* :math:`a_{P_{d,t}}`: the active power injection from each energy storage unit :math:`d \in \mathcal D_{DES}`.
-* :math:`a_{Q_{d,t}}`: the reactive power injection from each energy storage unit :math:`d \in \mathcal D_{DES}`.
-
-As with all Gym environments, the action space :math:`\mathcal A` from which the agent can choose actions can be
-obtained by calling :code:`env.action_space()`.
-
-Note that not all actions within :math:`\mathcal A` will be feasible in the current state :math:`s_t` (e.g., an empty
-storage unit cannot inject power into the network). The action :math:`a_t \in \mathcal A` chosen by the agent will
-first be mapped to the closest action (using Euclidean distance) in the feasible set :math:`\mathcal A(s_t)` before being
-applied in the environment.
diff --git a/docs/source/topics/archive/background.rst b/docs/source/topics/archive/background.rst
deleted file mode 100644
index 82d2ad5..0000000
--- a/docs/source/topics/archive/background.rst
+++ /dev/null
@@ -1,61 +0,0 @@
-..
-
-Background and notation
-=======================
-
-Reinforcement learning
-----------------------
-The documentation of :code:`gym-anm` assumes familiarity with basic reinforcement learning (RL) concepts. Some good
-resources to get started are:
-
-* `Reinforcement Learning: An Introduction <http://incompleteideas.net/book/the-book.html>`_
-* `OpenAI Spinning Up <https://spinningup.openai.com/en/latest/index.html>`_
-
-Being familiar with the `OpenAI Gym <https://gym.openai.com/>`_ framework is also useful, since :code:`gym-anm`
-environments follow the same framework.
-
-
-Distribution networks
----------------------
-
-Notation
-^^^^^^^^
-The main notations used throughout this documentation are listed below.
-
-* :math:`\mathbf i` - the imaginary number with :math:`\mathbf i^2 = -1`.
-* :math:`G(\mathcal N, \mathcal E)` - the directed graph representing the distribution network,
-* :math:`\mathcal N = \{0,1,\ldots,N-1\}` - the set of buses (or nodes) in the network,
-* :math:`\mathcal E \subseteq \mathcal N \times \mathcal N` - the set of directed edges (transmission lines) linking buses together,
-* :math:`e_{ij} \in \mathcal E` - the directed edge with sending bus :math:`i` and receiving bus :math:`j`,
-* :math:`\mathcal D = \{0,1,\ldots,D-1\}` - the set of all electrical devices connected to the grid. Each device
-  :math:`d \in \mathcal D` is connected to exactly one bus and may inject or withdraw power into/from the grid.
-* :math:`\mathcal D_i \subseteq \mathcal D` - the set of electrical devices connected to bus :math:`i`,
-* :math:`V_i, I_i, P_i^{(bus)}, Q_i^{(bus)}` - the complex voltage level, complex total current injection, total real power injection,
-  and total reactive power injection at bus :math:`i`, respectively,
-* :math:`P_d^{(dev)}, Q_d^{(dev)}` - the real and reactive power injections of device :math:`d \in \mathcal D` into the grid,
-  respectively,
-* :math:`I_{ij}, P_{ij}, Q_{ij}, S_{ij}` - the complex current, active power flow, reactive power flow, and complex
-  power flow in branch :math:`e_{ij} \in \mathcal E`, from bus :math:`i` to bus :math:`j`, respectively, with
-  :math:`S_{ij} = P_{ij} + \mathbf i Q_{ij}`.
-* :math:`\mathcal D_L \subset \mathcal D` - the set of passive load devices that only withdraw power from the grid,
-* :math:`\mathcal D_G \subset \mathcal D` - the set of generators, which only inject power into the grid, with the
-  exception the slack device (see below),
-* :math:`\mathcal D_{DES} \subset \mathcal D` - the set of distributed energy storage (DES) units, which can both
-  inject and withdraw power into/from the grid,
-* :math:`\mathcal D_{DER} \subset \mathcal D_G` - the set of renewable energy resources (a subset of all generators),
-* :math:`g^{slack} \in \mathcal D_G - \mathcal D_{DER}` - the slack device, a generator used to balance power flow in
-  the network and provide a voltage reference. The slack device is the only device connected to the slack bus :math:`i=0`,
-* :math:`SoC_d` - the state of charge (i.e., energy level) of storage unit :math:`d \in \mathcal D_{DES}`,
-* :math:`P_g^{(max)}` - the maximum real power that generator :math:`g \in \mathcal D_G - \{g^{slack}\}` can produce if
-  not curtailed,
-
-Basic concepts and assumptions
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The slack bus is assumed unique and at :math:`i=0`, with a fixed voltage reference :math:`V_0 = 1 \angle 0`.
-
-Unless otherwise stated, all electrical quantities are expressed in `per unit (p.u.) <https://en.wikipedia.org/wiki/Per-unit_system>`_.
-
-The power grid is assumed to be a `three-phase balanced system <https://en.wikipedia.org/wiki/Three-phase>`_ and we
-adopt its single-phase equivalent representation in all derivations.
-
-For a more in-depth description of the power grid model used in :code:`gym-anm`, see the `original paper ADD LINK`_.
diff --git a/docs/source/topics/archive/rewards.rst b/docs/source/topics/archive/rewards.rst
deleted file mode 100644
index 9a0271e..0000000
--- a/docs/source/topics/archive/rewards.rst
+++ /dev/null
@@ -1,65 +0,0 @@
-..
-
-.. _rewards_label:
-
-Rewards
-========
-As described in the :ref:`task_overview_label`, the reward signal is computed as:
-
-.. math::
-    \begin{align}
-        r_t =
-        \begin{cases}
-            -(\Delta E_{t:t+1} + \lambda \phi(s_{t+1})), & \text{if } s_{t+1} \notin \mathcal S^{terminal}, \\
-            - \frac{r^{max}}{1 - \gamma}, & \text{if } s \notin \mathcal S^{terminal} \text{ and }  s_{t+1} \in \mathcal S^{terminal}, \\
-            0, & \text{else.}
-        \end{cases}
-    \end{align}
-
-
-Energy loss
------------
-The energy loss :math:`\Delta E_{t:t+1}` is computed in three parts:
-
-.. math::
-    \begin{align}
-        \Delta E_{t:t+1} = \Delta E_{t:t+1}^{(1)} + \Delta E_{t:t+1}^{(2)} + \Delta E_{t:t+1}^{(3)} \;,
-    \end{align}
-
-where:
-
-* :math:`\Delta E_{t:t+1}^{(1)}` is the total transmission energy loss during :math:`(t, t+1]`, a result of leakage in
-  transmission lines and transformers.
-* :math:`\Delta E_{t:t+1}^{(2)}` is the total net amount of energy flowing from the grid into DES units during
-  :math:`(t, t+1]`. Over a sufficiently large number of timesteps, the sum of these terms will approximate the amount
-  of energy lost due to leakage in DES units.
-* :math:`\Delta E_{t:t+1}^{(3)}` is the total amount of energy loss as a result of renewable generation curtailment of
-  generators during :math:`(t, t+1]`. Depending on the regulation, this can be thought of as a fee paid by the DNO to
-  the owners of the generators that get curtailed, as financial compensation.
-
-
-Network constraint violation
-----------------------------
-In the penalty term :math:`\phi(s_{t+1})`, we consider two types of network-wide operating constraints: branch current
-limits and voltage constraints (see :ref:`task_overview_label`).
-
-Formally, :math:`\phi(s_{t+1})` is expressed as:
-
-.. math::
-    \begin{align}
-    \Phi(\mathbf s_{t+1}) = \Delta t \Big(&\sum_{i \in \mathcal N} \big(\max{(0, |V_{i,t+1}| - \overline V_i)} + \max{(0, \underline V_i - |V_{i,t+1}|)} \big) \nonumber \\
-    &+ \sum_{e_{ij} \in \mathcal E} \max{(0, |S_{ij,t+1}| - \overline S_{ij}, |S_{ji,t+1}| - \overline S_{ij})} \Big) \;.
-    \end{align}
-
-where
-
-* :math:`|V_{i,t+1}|` is the voltage magnitude at bus :math:`i` at time :math:`t+1` (in p.u.),
-* :math:`[\underline V_i, \overline V_i]` is the range of allowed voltage magnitude at bus :math:`i` (in p.u.),
-* :math:`|S_{ij,t+1}|` is the apparent power flow in branch :math:`e_{ij}` linking buses :math:`i` and :math:`j` at time
-  :math:`t+1`,
-* :math:`\overline S_{ij}` is the rated (i.e., maximum) apparent power flow of branch :math:`e_{ij}`.
-
-In practice, violating any network constraint can lead to damaging parts of the DN infrastructure (e.g., lines or
-transformers) or power outages, which can both have important economical consequences for the DNO. For that reason,
-ensuring that the DN operates within its constraints is often prioritized compared to minimizing energy loss. This can
-be achieved by choosing a large :math:`\lambda` or by setting an over-restrictive set of constraints in the environment.
diff --git a/docs/source/topics/archive/state_space.rst b/docs/source/topics/archive/state_space.rst
deleted file mode 100644
index 36a2e59..0000000
--- a/docs/source/topics/archive/state_space.rst
+++ /dev/null
@@ -1,27 +0,0 @@
-..
-
-.. _state_space_label:
-
-State space
-===========
-
-Formally, the state vectors used by :code:`gym-anm` are expressed as follows:
-
-.. math::
-    \begin{align}
-        s_t = \big[
-        \{P_{d,t}^{(dev)}\}_{d \in \mathcal D},\; \{Q_{d,t}^{(dev)}\}_{d \in \mathcal D},\; \{SoC_{d,t}\}_{d \in \mathcal D_{DES}},
-        \{P_{g,t}^{(max)}\}_{g \in \mathcal D_G - \{g^{slack}\}},\; \{aux^{(k)}_t\}_{k =0}^{K-1} \big] \;, \label{eq:state}
-    \end{align}
-
-where:
-
-* :math:`P_{d,t}^{(dev)}` and :math:`Q_{d,t}^{(dev)}` are the real and reactive power injections into the grid from
-  electrical device :math:`d \in \mathcal D` at time :math:`t`, respectively,
-* :math:`SoC_{d,t}` is the charge level, or state of charge (SoC), of storage unit :math:`d \in \mathcal D_{DES}`,
-* :math:`P_{g,t}^{(max)}` is the maximum production that generator :math:`g \in \mathcal D_G - \{g^{slack}\}` can
-  produce at time :math:`t`,
-* :math:`aux_t^{(k)}` is the value of the :math:`(k-1)^{th}` auxiliary variable generated during the transition from
-  timestep :math:`t` to timestep :math:`t+1`.
-
-Terminal states :math:`s \in \mathcal S^{terminal}` are represented by the zero vector :math:`[0,\ldots,0]`.
\ No newline at end of file
diff --git a/docs/source/topics/archive/task_overview.rst b/docs/source/topics/archive/task_overview.rst
deleted file mode 100644
index def0c1b..0000000
--- a/docs/source/topics/archive/task_overview.rst
+++ /dev/null
@@ -1,132 +0,0 @@
-..
-
-.. _task_overview_label:
-
-Task overview
-=============
-
-Each :code:`gym-anm` task can be described as a `Markov Decision Process <https://en.wikipedia.org/wiki/Markov_decision_process>`_ (MDP),
-of which an overview is provided below.
-
-Goal
-----
-When tackling tasks modelled by :code:`gym-anm`, the goal is to minimize the cost of operating the distribution network
-while avoiding the violation of grid constraints. In doing so, the agent takes the place of the Distribution Network
-Operator (DNO).
-
-Because real-world operating costs come from a wide range of sources (e.g., electricity market price, equipment
-maintenance, etc.), the true operating cost must be approximated to be practical. In :code:`gym-anm`, the operating
-cost is assumed to be fully described by a combination of:
-
-1. *Energy losses*: from cable transmission (dissipated as heat) and renewable energy curtailment (clamping the output
-   of a generator that could otherwise have produced more).
-2. *Network constraint violation*: violating network constraints (e.g., transmission line current constraints) may lead
-   to damaging parts of the power grid. In general, these costs are more important than energy losses. Note that, in
-   the worst case, failing to satisfy network constraints can lead to some form of network collapse (e.g., blackout).
-
-In addition, we restrict network constraints to two types of constraints. These constraints alone already represent
-most practical limits DNOs face in the real-world management of distribution networks:
-
-1. *Voltage constraints*: the voltage of each node of the network must remain within a specified range (e.g., [0.95, 1.05] pu).
-   This is required to ensure the power grid remains stable and that all devices connected to it can operate properly.
-2. *Branch current constraints*: transmission line currents must remain below certain pre-specified value (known as the
-   *rated* value) to prevent lines and transformers from overheating.
-
-
-Reward signal
--------------
-In order to drive the behavior of RL agents towards the goal mentioned above, :code:`gym-anm` uses a reward function
-that directly incorporates the quantities to be minimized: energy losses and network constraint violations.
-
-The reward signal :math:`r_t = r(s_t, a_t, s_{t+1})` is computed as:
-
-.. math::
-    \begin{align}
-        r_t =
-        \begin{cases}
-            -(\Delta E_{t:t+1} + \lambda \phi(s_{t+1})), & \text{if } s_{t+1} \notin \mathcal S^{terminal}, \\
-            - \frac{r^{max}}{1 - \gamma}, & \text{if } s \notin \mathcal S^{terminal} \text{ and }  s_{t+1} \in \mathcal S^{terminal}, \\
-            0, & \text{else,}
-        \end{cases}
-    \end{align}
-
-where:
-
-* :math:`\Delta E_{t:t+1}` is the total network-wide energy loss during :math:`(t,t+1]`,
-* :math:`\phi(s_{t+1})` is a penalty term associated with the violation of operating constraints,
-* :math:`\lambda` is a weighting hyperparameter,
-* :math:`r^{max}` is an upper bound on the rewards emitted by the environment, often chosen around 100, and
-* :math:`\gamma \in [0, 1]` is the discount factor.
-
-During the transition from a nonterminal state to a terminal one (i.e., when the network collapses), the environment
-emits a large negative reward and subsequent rewards are always zero, until a new trajectory is started by sampling a
-new initial state :math:`s_0`.
-
-For more information about the rewards, see :ref:`rewards_label`.
-
-
-Action vectors
---------------
-At each timestep, the agent must choose a set of actions to perform in the environment. These correspond to the
-management strategy employed by the DNO.
-
-The actions are collected into an action vector :math:`a_t \in \mathcal A`. In all :code:`gym-anm` tasks, each action
-vector contains 4 types of decision variables:
-
-* An upper limit on the active power that each generator can produce. In the case of renewable energy resources, this
-  corresponds to the curtailment value. For classical generators, it corresponds to a set-point specified by the DNO.
-* A set-point for the reactive power generation of each generator (renewable or not).
-* A set-point for the active power injection from each energy storage unit.
-* A set-point for the reactive power injection from each energy storage unit.
-
-The resulting action space is usually constrained. Whenever the agent selects an action that falls outside of the allowed
-action space, the selected action is first mapped to the physically-possible action before being applied in the environment.
-
-For more information about action vectors, see :ref:`action_space_label`.
-
-
-State vectors
--------------
-Because :code:`gym-anm` tasks are modelled as MDPs, environments can always be described by their current Markovian
-state, which we denote :math:`s_t \in \mathcal S`.
-
-In all :code:`gym-anm` environments, state vectors contain the following information:
-
-* The current (instantaneous) amount of power injected into (or withdrawn from) the power grid by each electrical device connected to it.
-* The current SoC of all energy storage units (e.g., batteries).
-* The maximum (theoretical) generation that each renewable energy resource could have produced if not curtailed, given
-  the current environmental conditions.
-* Any additional variables required to make the task Markovian (i.e., ensure that :math:`s_{t+1}` can be expressed
-  probabilistically given :math:`s_t` and :math:`a_t`). We refer to these as *auxiliary variables*.
-
-The environment may also end up in a terminal state :math:`s \in \mathcal S^{terminal} \subset \mathcal S`, which marks
-the end of the episode. Reaching a terminal state indicates that the power grid has collapsed, often due to a `voltage
-collapse problem <https://www.igi-global.com/dictionary/voltage-collapse/63464>`_. The environment will remain in a
-terminal state until it is reset.
-
-For more information about state vectors, see :ref:`state_space_label`.
-
-
-State transitions
------------------
-Each state transition from :math:`s_t` to :math:`s_{t+1}` are fully handled by the environment. They occur in three steps:
-
-1. A new outcome for the stochastic processes modelled by the environment is sampled. These include (a) the demand from
-   each load device, (b) the maximum generation from each generator, and (c) the auxiliary variables.
-2. Once the action :math:`a_t \in \mathcal A` has been selected by the agent, the action vector is mapped onto the set
-   of physically possible actions :math:`A(s_t)`.
-3. The mapped actions are then applied in the environment and the new electrical quantities are computed, resulting in
-   a new state :math:`s_{t+1}`, observation :math:`o_{t+1}`, and reward :math:`r_t`.
-
-.. For more information about state transitions, see :ref:`transition_label`.
-
-Observation vectors
--------------------
-In general, DNOs rarely have access to the full state of the distribution network when doing ANM.
-
-One of the key characteristics of :code:`gym-anm` is that new environments built using this framework allows users to
-easily define their own observation vectors. This means that the same task can be rendered more or less difficult by
-simply modifying the observation space, thus restricting the amount (or quality) of the information the agent has access to.
-
-To simplify the design of customized observation spaces, :code:`gym-anm` allows users to simply specify a set of
-variables to include in the observation vectors. For more information on designing new environments, see :ref:`framework_label`.
diff --git a/docs/source/topics/design_new_env.rst b/docs/source/topics/design_new_env.rst
index c31af2b..94b4e03 100644
--- a/docs/source/topics/design_new_env.rst
+++ b/docs/source/topics/design_new_env.rst
@@ -45,7 +45,7 @@ where:
   way to infer the bounds of observation vectors :math:`o_t` and :code:`observation_bounds()` can be used
   to specify them.
 * :code:`render()` and :code:`close()` are optional methods that can be implemented to support rendering
-  of the environment. For more information, see the official `Gym <https://gym.openai.com/docs/>`_ documentation.
+  of the environment. For more information, see the official `Gymnasium <https://gymnasium.farama.org/>`_ documentation.
 
 
 Example
diff --git a/docs/source/topics/quickstart.rst b/docs/source/topics/quickstart.rst
index 216dec3..85b9d10 100644
--- a/docs/source/topics/quickstart.rst
+++ b/docs/source/topics/quickstart.rst
@@ -16,11 +16,11 @@ are randomly sampled from the action space at each time step: ::
 
     def run():
         env = gym.make('gym_anm:ANM6Easy-v0')
-        o = env.reset()
+        o, _ = env.reset()
 
         for i in range(100):
             a = env.action_space.sample()
-            o, r, done, info = env.step(a)
+            o, r, terminated, _, _ = env.step(a)
             env.render()
             time.sleep(0.5)  # otherwise the rendering is too fast for the human eye.
 
@@ -29,7 +29,7 @@ are randomly sampled from the action space at each time step: ::
     if __name__ == '__main__':
         run()
 
-For more information about the Gym interface, read the `official documentation <https://github.com/openai/gym>`_.
+For more information about the Gymnasium interface, read the `official documentation <https://gymnasium.farama.org/>`_.
 
 
 Designing your own ANM task
diff --git a/docs/source/topics/using_env.rst b/docs/source/topics/using_env.rst
index 6fa0c8c..945b300 100644
--- a/docs/source/topics/using_env.rst
+++ b/docs/source/topics/using_env.rst
@@ -5,8 +5,7 @@ Using an Environment
 
 Initializing
 -------------
-If the :code:`gym-anm` environment you would like to use has already been registered in the :code:`gym`'s registry
-(see the `Gym documentation <https://gym.openai.com/docs/#available-environments>`_), you can initialize it with
+If the :code:`gym-anm` environment you would like to use has already been registered in the :code:`gymnasium`'s registry, you can initialize it with
 :code:`gym.make('gym_anm:<ENV_ID>')`, where :code:`<ENV_ID>` it the ID of the environment. For example: ::
 
     import gymnasium as gym
@@ -21,22 +20,22 @@ Alternatively, the environment can be initialized directly from its class: ::
 
 Agent-environment interactions
 ------------------------------
-Built on top of `Gym <https://github.com/openai/gym>`_, :code:`gym-anm` provides 2 core functions: :code:`reset()` and
+Built on top of `Gymnasium <https://gymnasium.farama.org/>`_, :code:`gym-anm` provides 2 core functions: :code:`reset()` and
 :code:`step(a)`.
 
 :code:`reset()` can be used to reset the environment and collect the first observation of the trajectory: ::
 
-    obs = env.reset()
+    obs, _ = env.reset()
 
 After the agent has selected an action :code:`a` to apply to the environment, :code:`step(a)` can be used to do so: ::
 
-    obs, r, done, info = env.step(a)
+    obs, r, terminated, _, info = env.step(a)
 
 where:
 
 * :code:`obs` is the vector of observations :math:`o_{t+1}`,
 * :code:`r` is the reward :math:`r_t`,
-* :code:`done` is a boolean value set to :code:`true` if :math:`s_{t+1}` is a terminal state,
+* :code:`terminated` is a boolean value set to :code:`true` if :math:`s_{t+1}` is a terminal state,
 * :code:`info` gathers information about the transition (it is seldom used in :code:`gym-anm`).
 
 Render the environment
@@ -58,16 +57,16 @@ Complete example
 A complete example of agent-environment interactions with an arbitrary agent :code:`agent`: ::
 
     env = gym.make('gym_anm:ANM6Easy-v0')
-    o = env.reset()
+    o, _ = env.reset()
 
     for i in range(1000):
         a = agent.act(o)
-        o, r, done, info = env.step(a)
+        o, r, terminated, _, info = env.step(a)
         env.render()
         time.sleep(0.5)   # otherwise the rendering is too fast for the human eye
 
         if done:
-            o = env.reset()
+            o, _ = env.reset()
 
 The above example would be rendered in your favorite web browser as: