simple_rl

A simple framework for experimenting with Reinforcement Learning in Python.

There are loads of other great libraries out there for RL. The aim of this one is twofold:

Simplicity.
Reproducibility of results.

A brief tutorial for a slightly earlier version is available here. As of version 0.77, the library should work with both Python 2 and Python 3. Please let me know if you find that is not the case!

simple_rl requires numpy and matplotlib. Some MDPs have visuals, too, which requires pygame. Also includes support for hooking into any of the Open AI Gym environments. I recently added a basic test script, contained in the tests directory.

Installation

The easiest way to install is with pip. Just run:

pip install simple_rl

Alternatively, you can download simple_rl here.

Example

Some examples showcasing basic functionality are included in the examples directory.

To run a simple experiment, import the run_agents_on_mdp(agent_list, mdp) method from simple_rl.run_experiments and call it with some agents for a given MDP. For example:

# Imports
from simple_rl.run_experiments import run_agents_on_mdp
from simple_rl.tasks import GridWorldMDP
from simple_rl.agents import QLearningAgent

# Run Experiment
mdp = GridWorldMDP()
agent = QLearningAgent(mdp.get_actions())
run_agents_on_mdp([agent], mdp)

Running the above code will run unleash Q-learning on a simple GridWorld. When it finishes it will store the results in cur_dir/results/* and open the following plot:

For a slightly more complicated example, take a look at the code of simple_example.py. Here we run three few agents on the grid world from the Russell-Norvig AI textbook:

from simple_rl.agents import QLearningAgent, RandomAgent, RMaxAgent
from simple_rl.tasks import GridWorldMDP
from simple_rl.run_experiments import run_agents_on_mdp

# Setup MDP.
mdp = GridWorldMDP(width=4, height=3, init_loc=(1, 1), goal_locs=[(4, 3)], lava_locs=[(4, 2)], gamma=0.95, walls=[(2, 2)])

# Setup Agents.
ql_agent = QLearningAgent(actions=mdp.get_actions())
rmax_agent = RMaxAgent(actions=mdp.get_actions())
rand_agent = RandomAgent(actions=mdp.get_actions())

# Run experiment and make plot.
run_agents_on_mdp([ql_agent, rmax_agent, rand_agent], mdp, instances=5, episodes=50, steps=10)

The above code will generate the following plot:

Overview

(agents): Code for some basic agents (a random actor, Q-learning, [R-Max], Q-learning with a Linear Approximator, and so on).
(experiments): Code for an Experiment class to track parameters and reproduce results.
(mdp): Code for a basic MDP and MDPState class, and an MDPDistribution class (for lifelong learning). Also contains OO-MDP implementation [Diuk et al. 2008].
(planning): Implementations for planning algorithms, includes ValueIteration and MCTS [Couloum 2006], the latter being still in development.
(tasks): Implementations for a few standard MDPs (grid world, N-chain, Taxi [Dietterich 2000], and the OpenAI Gym).
(utils): Code for charting and other utilities.

Making a New MDP

Make an MDP subclass, which needs:

A static variable, ACTIONS, which is a list of strings denoting each action.
Implement a reward and transition function and pass them to MDP constructor (along with ACTIONS).
I also suggest overwriting the "__str__" method of the class, and adding a "__init__.py" file to the directory.
Create a State subclass for your MDP (if necessary). I suggest overwriting the "__hash__", "__eq__", and "__str__" for the class to play along well with the agents.

Making a New Agent

Make an Agent subclass, which requires:

A method, act(self, state, reward), that returns an action.
A method, reset(), that puts the agent back to its tabula rasa state.

In Development

I'm hoping to add the following features during Summer-Fall 2018:

Planning: MCTS [Coloum 2006] and RTDP [Barto et al. 1995]
Deep RL: Polish the DQN [Mnih et al. 2015], others (DDPG).
Efficiency: Convert most defaultdict/dict uses to numpy.
Docs: Tutorials, contribution policy, and thorough documentation.
Visuals: Unify MDP visualization.
POMDP: Add an abstract POMDPClass and some basic solvers.
Misc: Additional testing, reproducibility checks (store more in params file, rerun experiment from params file).

If you'd like to help out, I'll be making a mailing list for the library soon -- shoot me an email. Let me know if you have any questions or suggestions.

Cheers,

-Dave

Name		Name	Last commit message	Last commit date
Latest commit History 297 Commits
examples		examples
simple_rl		simple_rl
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST		MANIFEST
README.md		README.md
_config.yml		_config.yml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

simple_rl

Installation

Example

Overview

Making a New MDP

Making a New Agent

In Development

About

Releases

Packages

Languages

License

abagaria/simple_rl

Folders and files

Latest commit

History

Repository files navigation

simple_rl

Installation

Example

Overview

Making a New MDP

Making a New Agent

In Development

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages