Reinforement-Learning-on-Taxi-Problem

This project explores the implementation of SARSA and Q-learning algorithms on the Taxi problem from the OpenAI Gym Toy_text environment. The aim is to train optimal policies for the taxi agent and analyze the computational efficiency and reward maximization capabilities of the two algorithms.

Description:

In this environment, there are four designated locations represented by R(ed), G(reen), Y(ellow), and B(lue). At the beginning of each episode, the taxi and the passenger are randomly positioned in the grid world. The taxi's task is to navigate to the passenger's location, pick up the passenger, drive to the passenger's destination, and finally drop off the passenger. The episode ends when the passenger is successfully dropped off.

Presentation slides

Actions:

The action space consists of 6 discrete deterministic actions:

0: Move south
1: Move north
2: Move east
3: Move west
4: Pickup passenger
5: Drop off passenger

Observations:

The observation space consists of 500 discrete states. Each state is represented by a tuple: (taxi_row, taxi_col, passenger_location, destination). There are 25 possible taxi positions, 5 passenger locations (including when the passenger is in the taxi), and 4 destination locations. However, there are 400 states that can be reached during an episode, as some states correspond to situations where the passenger is already at their destination. Additionally, four additional states can be observed when both the passenger and the taxi are at the destination, right after a successful episode.

Rewards:

The project implements SARSA and Q-learning algorithms to train optimal policies for the Taxi problem. SARSA is an on-policy algorithm that updates the Q-values based on the current policy, while Q-learning is an off-policy algorithm that updates the Q-values using the maximum Q-value of the next state. By comparing the two algorithms, we analyze their computational efficiency in terms of convergence time and their ability to maximize the cumulative reward.

Implementation:

The project implements SARSA and Q-learning algorithms to train optimal policies for the Taxi problem. SARSA is an on-policy algorithm that updates the Q-values based on the current policy, while Q-learning is an off-policy algorithm that updates the Q-values using the maximum Q-value of the next state. By comparing the two algorithms, we analyze their computational efficiency in terms of convergence time and their ability to maximize the cumulative reward.

Project Structure:

QLearning.py: Implements the Q-learning algorithm for training the optimal policy.
SarsaLearning.py: Implements the SARSA algorithm for training the optimal policy.
plot.py: contains code to generate plots for the comparative analysis of the Q-learning and SARSA learning algorithms, providing insights into their performance on the Taxi problem.

Comparative Analysis:

A comparative analysis of the SARSA and Q-learning algorithms was conducted to evaluate their performance on the Taxi problem. The analysis focused on two main aspects:

Computational Efficiency: The convergence time of the two algorithms was measured to assess their computational efficiency. Convergence time refers to the number of episodes required for the algorithms to converge to an optimal policy. The algorithm that achieved convergence in fewer episodes was considered more computationally efficient.
Maximizing Reward: The learned policies from both SARSA and Q-learning were executed from various starting states, and the average reward was calculated for each algorithm. The average reward represents the effectiveness of the learned policy in maximizing the cumulative reward over multiple episodes. The algorithm with a higher average reward was considered better at maximizing rewards.

The results of the comparative analysis can be found in the attached report. The analysis includes graphs, tables, and discussions highlighting the performance of SARSA and Q-learning in terms of computational efficiency and reward maximization.

Prerequisites:

Python >= 3.8.0 (code has been tested with this version)
Refer to requirments.txt

To Run:

To run the project, use the following steps:

Clone the project repository.
Install the necessary packages by running pip install -r requirements.txt.
Run the files QLearning.py and SarsaLearning.py to see the rendering of episodes over the implemented algorithm.

License

This project is licensed under the MIT License.

Acknowledgments

The Taxi problem environment is part of the Toy Text environments provided by OpenAI Gym. For more information about the environment and its history, refer to the OpenAI Gym documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
Graphs		Graphs
__pycache__		__pycache__
LICENSE		LICENSE
Presentation.pdf		Presentation.pdf
QLearning.py		QLearning.py
README.md		README.md
Report.pdf		Report.pdf
SarsaLearning.py		SarsaLearning.py
plot.py		plot.py
requirements.txt		requirements.txt
tempCodeRunnerFile.py		tempCodeRunnerFile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforement-Learning-on-Taxi-Problem

Description:

Actions:

Observations:

Rewards:

Implementation:

Project Structure:

Comparative Analysis:

Prerequisites:

To Run:

License

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

mj06879/Reinforement-Learning-on-Taxi-Problem

Folders and files

Latest commit

History

Repository files navigation

Reinforement-Learning-on-Taxi-Problem

Description:

Actions:

Observations:

Rewards:

Implementation:

Project Structure:

Comparative Analysis:

Prerequisites:

To Run:

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages