I write programs to play board game variants sometimes. The basic strategy is standard alpha-beta pruning or similar searches, sometimes augmented by the usual approaches to endgames or openings. I've mostly played around with chess variants, so when it comes time to pick my evaluation function, I use a basic chess evaluation function.
However, now I am writing a program to play a completely new board game. How do I choose a good or even decent evaluation function?
The main challenges are that the same pieces are always on the board, so a usual material function won't change based on position, and the game has been played less than a thousand times or so, so humans don't necessarily play it enough well yet to give insight. (PS. I considered a MoGo approach, but random games aren't likely to terminate.)
Game details: The game is played on a 10-by-10 board with a fixed six pieces per side. The pieces have certain movement rules, and interact in certain ways, but no piece is ever captured. The goal of the game is to have enough of your pieces in certain special squares on the board. The goal of the computer program is to provide a player which is competitive with or better than current human players.
Independence, credibility and use are considered as the basic principles that must be observed in all evaluation functions. In addition, it is agreed that the evaluations should include analysis of relevance, efficiency, effectiveness, sustainability and impact.
Evaluating a function means finding the value of f(x) =… or y =… that corresponds to a given value of x. To do this, simply replace all the x variables with whatever x has been assigned. For example, if we are asked to evaluate f(4), then x has been assigned the value of 4.
For Tic-Tac-Toe, the function could be as simple as returning +1 if the computer wins, -1 if the player wins, or 0 otherwise. However, simple evaluation function may require deeper search. A better evaluation function for Tic-Tac-Toe is: +100 for EACH 3-in-a-line for computer.
Evaluation function looks at a given board position and returns a value that predicts the goodness of the position without considering future possibilities. That is, the evaluation function evaluates a chess board statically by taking into account only the current position of the pieces.
Find a few candidates for your evaluation function, like mobility (# of possible moves) minus opponent's mobility, then try to find the optimal weight for each metric. Genetic algorithms seem to work pretty well for optimizing weights in an evaluation function.
Create a population with random weights, fight them against each other with a limited depth and turns, replace the losers with random combinations from the winners, shuffle, and repeat, printing out the population average after every generation. Let it run until you're satisfied with the result, or until you see a need to adjust the range for some of the metrics and try again, if it appears that the optimal value for one metric might be outside your initial range.
Late edit: A more accepted, studied, understood approach that I didn't know at the time is something called "Differential Evolution". Offspring are created from 3 parents instead of 2, in such a way that avoids the problem of premature convergence towards the average.
I will start with some basics and move to harder stuff later.
Basic agent and a testing framework
No matter what approach you take you need to start with something really simple and dumb. The best approach for a dumb agent is a random one (generate all possible moves, select one at random). This will serve as a starting point to compare all your other agents. You need a strong framework for comparison. Something that takes various agents, allows to play some number of games between them and returns the matrix of the performance. Based on the results, you calculate the fitness for each agent. For example your function tournament(agent1, agent2, agent3, 500)
will play 500 games between each pair of agent (playing the first/second) and returns you something like:
x -0.01 -1.484 | -1.485 0.01 x -1.29 | -1.483 1.484 1.29 x | 2.774
Here for example I use 2 points for a win, 1 point for draw scoring function, and at the end just summing everything to find the fitness. This table immediately tells me that agent3
is the best, and agent1
is not really different from agent2
.
So once these two important things are set up you are ready to experiment with your evaluation functions.
Let's start with selecting features
First of all you need to create not a terrible
evaluation function. By this I mean that this function should correctly identify 3 important aspects (win/draw/loss). This sounds obvious, but I have seen significant amount of bots, where the creators were not able to correctly set up these 3 aspects.
Then you use your human ingenuity to find some features of the game state. The first thing to do is to speak with a game expert and ask him how he access the position.
If you do not have the expert, or you even just created the rules of your game 5 minutes ago, do not underestimate the human's ability to search for patters. Even after playing a couple of games, a smart person can give you ideas how he should have played (it does not mean that he can implement the ideas). Use these ideas as features.
At this point you do not really need to know how these features affect the game. Example of features: value of the pieces, pieces mobility, control of important positions, safety, total number of possible moves, closeness to a finish.
After you coded up these features and used them separately to see what works best (do not hurry up to discard features that do not perform reasonable by itself, they might be helpful in conjunction with others), you are ready to experiment with combinations.
Building better evaluations by combining and weighting simple features. There are a couple of standard approaches.
Create an uber function based on various combinations of your features. It can be linear eval = f_1 * a_1 + ... f_n * a_n
(f_i
features, a_i
coefficients), but it can be anything. Then instantiate many agents with absolutely random weights for this evaluation function and use genetic algorithm to play them agains each other. Compare the results using the testing framework, discard a couple of clear losers and mutate a couple of winners. Continue the same process. (This is a rough outline, read more about GA)
Use the back-propagation idea from a neural networks to back propagate the error from the end of the game to update the weights of your network. You can read more how it was done with backgammon (I have not written anything similar, so sorry for the shortness).
You can work without evaluation function! This might sound insane for a person who only heard about minimax/alpha-beta, but there are methods which do not require an evaluation at all. One of them is called Monte Carlo Tree Search and as a Monte Carlo in a name suggests it uses a lot of random (it should not be random, it can use your previous good agents) game plays to generate a tree. This is a huge topic by itself, so I will give you mine really high-level explanation. You start with a root, create your frontier, which you try to expand. Once you expand something, you just randomly go to the leaf. Getting the result from the leaf, you backpropagate the result. Do this many many times, and collect the statistics about each child of the current frontier. Select the best one. There is significant theory there which relates to how do you balance between exploration and exploitation and a good thing to read there is UCT (Upper Confidence Bound algorithm)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With