An implementation of the AlphaGo Zero algorithm for Ultimate tic-tac-toe. This is a successful attempt to adapt the algorithm to the different game.
- Reproduce DeepMind results and prove that their algorithm works for various types of games.
- Test the effects of different values of hyperparameters on model learning.
- Python 3 Libraries:
- NumPy
- Keras
- Tensorflow
You can install them by running this script:
pip3 install -r requirements.txt
The project should run with the newest version of the above libraries, but if it's not the case try this working configuration:
- NumPy 1.17.3
- Keras 2.3.1
- Tensorflow 2.0.0
If you want to use GPU install tensorflow-gpu and its requirements instead.
This project assumes that you have at least general idea how AGZ algorithm works. Here is a great summary.
The training pipeline consists of three stages:
- Self-play
- Network training
- Testing network.
Run
python3 Selfplay_module.py
After generating a large number of games (around 40k) go to the next step.
Run
python3 Training_module.py
The network after training is saved as "new_network.h5".
Run
python3 Test_module.py <old_network_name> <new_network_name>
You should check if the network after training performs better than the previous one. The pass rate is 50%. If the network fails retrain it or generate more games. After passing rename the network as "current_network.h5" and you are ready to repeat the three stages again.
Due to smaller computing capabilities than DeepMind's supercomputers, I only trained one new generation of network. It passed the test with 62% wins.
In progress...
- Add multithreading to MCTS
- Allow saving generated games in JSON
- Automate the training process
- Optimization
- Bugfixing
- Code cleaning