Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a good evaluation function for a game?

Tags:

I write programs to play board game variants sometimes. The basic strategy is standard alpha-beta pruning or similar searches, sometimes augmented by the usual approaches to endgames or openings. I've mostly played around with chess variants, so when it comes time to pick my evaluation function, I use a basic chess evaluation function.

However, now I am writing a program to play a completely new board game. How do I choose a good or even decent evaluation function?

The main challenges are that the same pieces are always on the board, so a usual material function won't change based on position, and the game has been played less than a thousand times or so, so humans don't necessarily play it enough well yet to give insight. (PS. I considered a MoGo approach, but random games aren't likely to terminate.)

Game details: The game is played on a 10-by-10 board with a fixed six pieces per side. The pieces have certain movement rules, and interact in certain ways, but no piece is ever captured. The goal of the game is to have enough of your pieces in certain special squares on the board. The goal of the computer program is to provide a player which is competitive with or better than current human players.

like image 294
A. Rex Avatar asked Aug 18 '09 01:08

A. Rex


People also ask

What makes a good evaluation function?

Independence, credibility and use are considered as the basic principles that must be observed in all evaluation functions. In addition, it is agreed that the evaluations should include analysis of relevance, efficiency, effectiveness, sustainability and impact.

How do you come up with an evaluation function?

Evaluating a function means finding the value of f(x) =… or y =… that corresponds to a given value of x. To do this, simply replace all the x variables with whatever x has been assigned. For example, if we are asked to evaluate f(4), then x has been assigned the value of 4.

What would be a good evaluation utility function for the Tic Tac Toe game?

For Tic-Tac-Toe, the function could be as simple as returning +1 if the computer wins, -1 if the player wins, or 0 otherwise. However, simple evaluation function may require deeper search. A better evaluation function for Tic-Tac-Toe is: +100 for EACH 3-in-a-line for computer.

What is chess evaluation function?

Evaluation function looks at a given board position and returns a value that predicts the goodness of the position without considering future possibilities. That is, the evaluation function evaluates a chess board statically by taking into account only the current position of the pieces.


2 Answers

Find a few candidates for your evaluation function, like mobility (# of possible moves) minus opponent's mobility, then try to find the optimal weight for each metric. Genetic algorithms seem to work pretty well for optimizing weights in an evaluation function.

Create a population with random weights, fight them against each other with a limited depth and turns, replace the losers with random combinations from the winners, shuffle, and repeat, printing out the population average after every generation. Let it run until you're satisfied with the result, or until you see a need to adjust the range for some of the metrics and try again, if it appears that the optimal value for one metric might be outside your initial range.

Late edit: A more accepted, studied, understood approach that I didn't know at the time is something called "Differential Evolution". Offspring are created from 3 parents instead of 2, in such a way that avoids the problem of premature convergence towards the average.

like image 38
David Avatar answered Oct 21 '22 16:10

David


I will start with some basics and move to harder stuff later.

Basic agent and a testing framework

No matter what approach you take you need to start with something really simple and dumb. The best approach for a dumb agent is a random one (generate all possible moves, select one at random). This will serve as a starting point to compare all your other agents. You need a strong framework for comparison. Something that takes various agents, allows to play some number of games between them and returns the matrix of the performance. Based on the results, you calculate the fitness for each agent. For example your function tournament(agent1, agent2, agent3, 500) will play 500 games between each pair of agent (playing the first/second) and returns you something like:

  x         -0.01       -1.484   |  -1.485 0.01          x         -1.29    |  -1.483 1.484       1.29          x      |  2.774 

Here for example I use 2 points for a win, 1 point for draw scoring function, and at the end just summing everything to find the fitness. This table immediately tells me that agent3 is the best, and agent1 is not really different from agent2.

So once these two important things are set up you are ready to experiment with your evaluation functions.


Let's start with selecting features

  1. First of all you need to create not a terrible evaluation function. By this I mean that this function should correctly identify 3 important aspects (win/draw/loss). This sounds obvious, but I have seen significant amount of bots, where the creators were not able to correctly set up these 3 aspects.

  2. Then you use your human ingenuity to find some features of the game state. The first thing to do is to speak with a game expert and ask him how he access the position.

  3. If you do not have the expert, or you even just created the rules of your game 5 minutes ago, do not underestimate the human's ability to search for patters. Even after playing a couple of games, a smart person can give you ideas how he should have played (it does not mean that he can implement the ideas). Use these ideas as features.

  4. At this point you do not really need to know how these features affect the game. Example of features: value of the pieces, pieces mobility, control of important positions, safety, total number of possible moves, closeness to a finish.

  5. After you coded up these features and used them separately to see what works best (do not hurry up to discard features that do not perform reasonable by itself, they might be helpful in conjunction with others), you are ready to experiment with combinations.

Building better evaluations by combining and weighting simple features. There are a couple of standard approaches.

  1. Create an uber function based on various combinations of your features. It can be linear eval = f_1 * a_1 + ... f_n * a_n (f_i features, a_i coefficients), but it can be anything. Then instantiate many agents with absolutely random weights for this evaluation function and use genetic algorithm to play them agains each other. Compare the results using the testing framework, discard a couple of clear losers and mutate a couple of winners. Continue the same process. (This is a rough outline, read more about GA)

  2. Use the back-propagation idea from a neural networks to back propagate the error from the end of the game to update the weights of your network. You can read more how it was done with backgammon (I have not written anything similar, so sorry for the shortness).

You can work without evaluation function! This might sound insane for a person who only heard about minimax/alpha-beta, but there are methods which do not require an evaluation at all. One of them is called Monte Carlo Tree Search and as a Monte Carlo in a name suggests it uses a lot of random (it should not be random, it can use your previous good agents) game plays to generate a tree. This is a huge topic by itself, so I will give you mine really high-level explanation. You start with a root, create your frontier, which you try to expand. Once you expand something, you just randomly go to the leaf. Getting the result from the leaf, you backpropagate the result. Do this many many times, and collect the statistics about each child of the current frontier. Select the best one. There is significant theory there which relates to how do you balance between exploration and exploitation and a good thing to read there is UCT (Upper Confidence Bound algorithm)

like image 194
Salvador Dali Avatar answered Oct 21 '22 16:10

Salvador Dali