December 10, 2010
Final Project Writeup
Using the board game Connect Four as a learning platform, I am testing the theory that it is better to learn how to play against an expert rather than against another novice player. To test this, two heuristics have been developed: an expert, which was developed using MiniMax, and a novice, which was developed using Q-Learning. Copies of the novice heuristic train against the expert as well as themselves, deriving two "learned" heuristics (expert vs. novice, novice vs. novice) which are then run against each other to see which has learned to play the game best.
|Main Menu ||Game Board |
|Novice vs. Expert Training |
- MiniMax was used to develop the expert heuristic. Using the convention that an expert can visualize the board up to three moves in the future, the depth of the MiniMax tree is set to 5.
- Q-Learning was used to develop the novice heuristic. Due to the vast number of possible game boards (states), the randomness is set to a very high value in order to encourage exploration.
The Connect Four board game has been completely solved mathematically by James D. Allen in 1988, but the approach used a full game tree expansion. With over 42 different nodes to expand, the solution takes days to complete and is impossible to play against using traditional MiniMax board expansion. However, learning this complete solution can be achieved by a Q-Learning heuristic that has learned the correct method to force a win. Not only with the heuristic know the perfect game, but its execution time will be small enough that it could actually be played against. The amount of time to develop this learned heuristic will be substantial though.
My project attempts to determine if a dynamically derived Q-Learning heuristic learns how to win faster than a statically derived one. Using the same number of training runs for both of the derived learned heuristics, it should be evident as to which one is better by running them against each other. If it is determined that a better solution is derived "faster" by the novice vs. novice trainings, it would eliminate the need for training against the full board expansion. This would dramatically decrease the time that is taken to derive the "perfect" heuristic.
Technology Used Block Diagram
The entire program is written in python, but this is used to show how the training and statistics are derived.
Evaluation of Results
Due to the high randomness and sheer number of game boards to learn, the novice heuristic needs many (~10,000) training runs before it begins to win actual games against the expert. Even so, after 10,000 training runs the novice heuristic wins on average about 7 of the 100 real games that it plays.
For novice vs. novice, the training rate is more difficult to calculate. Each heuristic is learning to play the game at the same time and the randomness of the learning from both heuristics makes no clear pattern of wins. After 10,000 test runs, the first novice won 47 out of 100 real games, the second novice 44 out of 100 real games and 9 out of 100 games were ties.
When the learned heuristics of expert vs. novice (learned 1) and novice vs. novice (learned 2) were placed against each other, the results of 10 test games were learned 1 winning 4, learned 2 winning 5, and tying once.
It appears that the heuristic developed from novice vs. novice is better at playing the game than the heuristic developed from expert vs. novice.
The results that I have achieved are based on 10,000 training runs for each of the learned heuristics. However, the number of game boards vastly outweighs 10,000 and as such the heuristics are not fully developed. Over the course of the next week, I will be running hundreds of thousands of training games for both learned heuristics and this should provide a much better representation of the training that has taken place.