Programmer Puzzle: Encoding a chess board state throughout a game

The Problem

This image illustrates the starting Chess position. Chess occurs on an 8x8 board with each player starting with an identical set of 16 pieces consisting of 8 pawns, 2 rooks, 2 knights, 2 bishops, 1 queen and 1 king as illustrated here:

starting chess position

Positions are generally recorded as a letter for the column followed by the number for the row so White’s queen is at d1. Moves are most often stored in algebraic notation, which is unambiguous and generally only specifies the minimal information necessary. Consider this opening:

e4 e5
Nf3 Nc6
…

which translates to:

White moves king’s pawn from e2 to e4 (it is the only piece that can get to e4 hence “e4”);
Black moves the king’s pawn from e7 to e5;
White moves the knight (N) to f3;
Black moves the knight to c6.
…

The board looks like this:

early opening

An important ability for any programmer is to be able to correctly and unambiguously specify the problem.

So what’s missing or ambiguous? A lot as it turns out.

Board State vs Game State

The first thing you need to determine is whether you’re storing the state of a game or the position of pieces on the board. Encoding simply the positions of the pieces is one thing but the problem says “all subsequent legal moves”. The problem also says nothing about knowing the moves up to this point. That’s actually a problem as I’ll explain.

Castling

The game has proceeded as follows:

e4 e5
Nf3 Nc6
Bb5 a6
Ba4 Bc5

The board looks as follows:

later opening

White has the option of castling. Part of the requirements for this are that the king and the relevant rook can never have moved, so whether the king or either rook of each side has moved will need to be stored. Obviously if they aren’t on their starting positions, they have moved otherwise it needs to be specified.

There are several strategies that can be used for dealing with this problem.

Firstly, we could store an extra 6 bits of information (1 for each rook and king) to indicate whether that piece had moved. We could streamline this by only storing a bit for one of these six squares if the right piece happens to be in it. Alternatively we could treat each unmoved piece as another piece type so instead of 6 piece types on each side (pawn, rook, knight, bishop, queen and king) there are 8 (adding unmoved rook and unmoved king).

En Passant

Another peculiar and often-neglected rule in Chess is En Passant.

en passant

The game has progressed.

e4 e5
Nf3 Nc6
Bb5 a6
Ba4 Bc5
O-O b5
Bb3 b4
c4

Black’s pawn on b4 now has the option of moving his pawn on b4 to c3 taking the White pawn on c4. This only happens on the first opportunity meaning if Black passes on the option now he can’t take it next move. So we need to store this.

If we know the previous move we can definitely answer if En Passant is possible. Alternatively we can store whether each pawn on its 4th rank has just moved there with a double move forward. Or we can look at each possible En Passant position on the board and have a flag to indicate whether its possible or not.

Promotion

pawn promotion

It is White’s move. If White moves his pawn on h7 to h8 it can be promoted to any other piece (but not the king). 99% of the time it is promoted to a Queen but sometimes it isn’t, typically because that may force a stalemate when otherwise you’d win. This is written as:

h8=Q

This is important in our problem because it means we can’t count on there being a fixed number of pieces on each side. It is entirely possible (but incredibly unlikely) for one side to end up with 9 queens, 10 rooks, 10 bishops or 10 knights if all 8 pawns get promoted.

Stalemate

When in a position from which you cannot win your best tactic is to try for a stalemate. The most likely variant is where you cannot make a legal move (usually because any move when put your king in check). In this case you can claim a draw. This one is easy to cater for.

The second variant is by threefold repetition. If the same board position occurs three times in a game (or will occur a third time on the next move), a draw can be claimed. The positions need not occur in any particular order (meaning it doesn’t have to the same sequence of moves repeated three times). This one greatly complicates the problem because you have to remember every previous board position. If this is a requirement of the problem the only possible solution to the problem is to store every previous move.

Lastly, there is the fifty move rule. A player can claim a draw if no pawn has moved and no piece has been taken in the previous fifty consecutive moves so we would need to store how many moves since a pawn was moved or a piece taken (the latest of the two. This requires 6 bits (0-63).

Whose Turn Is It?

Of course we also need to know whose turn it is and this is a single bit of information.

Two Problems

Because of the stalemate case, the only feasible or sensible way to store the game state is to store all the moves that led to this position. I’ll tackle that one problem. The board state problem will be simplified to this: store the current position of all pieces on the board ignoring castling, en passant, stalemate conditions and whose turn it is.

Piece layout can be broadly handled in one of two ways: by storing the contents of each square or by storing the position of each piece.

Simple Contents

There are six piece types (pawn, rook, knight, bishop, queen and king). Each piece can be White or Black so a square may contain one of 12 possible pieces or it may be empty so there are 13 possibilities. 13 can be stored in 4 bits (0-15) So the simplest solution is to store 4 bits for each square times 64 squares or 256 bits of information.

The advantage of this method is that manipulation is incredibly easy and fast. This could even be extended by adding 3 more possibilities without increasing the storage requirements: a pawn that has moved 2 spaces on the last turn, a king that hasn’t moved and a rook that hasn’t moved, which will cater for a lot of previously mentioned issues.

But we can do better.

Base 13 Encoding

It is often helpful to think of the board position as a very large number. This is often done in computer science. For example, the halting problem treats a computer program (rightly) as a large number.

The first solution treats the position as a 64 digit base 16 number but as demonstrated there is redundancy in this information (being the 3 unused possibilities per “digit”) so we can reduce the number space to 64 base 13 digits. Of course this can’t be done as efficiently as base 16 can but it will save on storage requirements (and minimizing storage space is our goal).

In base 10 the number 234 is equivalent to 2 x 10² + 3 x 10¹ + 4 x 10⁰.

In base 16 the number 0xA50 is equivalent to 10 x 16² + 5 x 16¹ + 0 x 16⁰ = 2640 (decimal).

So we can encode our position as p₀ x 13⁶³ + p₁ x 13⁶² + ... + p₆₃ x 13⁰ where p_i represents the contents of square i.

2²⁵⁶ equals approximately 1.16e77. 13⁶⁴ equals approximately 1.96e71, which requires 237 bits of storage space. That saving of a mere 7.5% comes at a cost of significantly increased manipulation costs.

Variable Base Encoding

In legal boards certain pieces can’t appear in certain squares. For example, pawns cannot occur at in the first or eighth ranks, reducing the possibilities for those squares to 11. That reduces the possible boards to 11¹⁶ x 13⁴⁸ = 1.35e70 (approximately), requiring 233 bits of storage space.

Actually encoding and decoding such values to and from decimal (or binary) is a little more convoluted but it can be done reliably and is left as an exercise to the reader.

Variable Width Alphabets

The previous two methods can both be described as fixed-width alphabetic encoding. Each of the 11, 13 or 16 members of the alphabet is substituted for another value. Each “character” is the same width but the efficiency can be improved when you consider that each character is not equally likely.

morse code

Consider Morse code (pictured above). Characters in a message are encoded as a sequence of dashes and dots. Those dashes and dots are transferred over radio (typically) with a pause between them to delimit them.

Notice how the letter E (the most common letter in English) is a single dot, the shortest possible sequence, whereas Z (the least frequent) is two dashes and two beeps.

Such a scheme can significantly reduce the size of an expected message but comes at the cost of increasing the size of a random character sequence.

It should be noted that Morse code has another inbuilt feature: dashes are as long as three dots so the above code is created with this in mind to minimize the use of dashes. Since 1s and 0s (our building blocks) don’t have this problem, it’s not a feature we need to replicate.

Lastly, there are two kinds of rests in Morse code. A short rest (the length of a dot) is used to distinguish between dots and dashes. A longer gap (the length of a dash) is used to delimit characters.

So how does this apply to our problem?

Huffman Coding

There is an algorithm for dealing with variable length codes called Huffman coding. Huffman coding creates a variable length code substitution, typically uses expected frequency of the symbols to assign shorter values to the more common symbols.

In the above tree, the letter E is encoded as 000 (or left-left-left) and S is 1011. It should be clear that this encoding scheme is unambiguous.

This is an important distinction from Morse code. Morse code has the character separator so it can do otherwise ambiguous substitution (eg 4 dots can be H or 2 Is) but we only have 1s and 0s so we choose an unambiguous substitution instead.

Below is a simple implementation:

private static class Node {
  private final Node left;
  private final Node right;
  private final String label;
  private final int weight;

  private Node(String label, int weight) {
    this.left = null;
    this.right = null;
    this.label = label;
    this.weight = weight;
  }

  public Node(Node left, Node right) {
    this.left = left;
    this.right = right;
    label = "";
    weight = left.weight + right.weight;
  }

  public boolean isLeaf() { return left == null && right == null; }

  public Node getLeft() { return left; }

  public Node getRight() { return right; }

  public String getLabel() { return label; }

  public int getWeight() { return weight; }
}

with static data:

private final static List<string> COLOURS;
private final static Map<string, integer> WEIGHTS;

static {
  List<string> list = new ArrayList<string>();
  list.add("White");
  list.add("Black");
  COLOURS = Collections.unmodifiableList(list);
  Map<string, integer> map = new HashMap<string, integer>();
  for (String colour : COLOURS) {
    map.put(colour + " " + "King", 1);
    map.put(colour + " " + "Queen";, 1);
    map.put(colour + " " + "Rook", 2);
    map.put(colour + " " + "Knight", 2);
    map.put(colour + " " + "Bishop";, 2);
    map.put(colour + " " + "Pawn", 8);
  }
  map.put("Empty", 32);
  WEIGHTS = Collections.unmodifiableMap(map);
}

and:

private static class WeightComparator implements Comparator<node> {
  @Override
  public int compare(Node o1, Node o2) {
    if (o1.getWeight() == o2.getWeight()) {
      return 0;
    } else {
      return o1.getWeight() < o2.getWeight() ? -1 : 1;
    }
  }
}

private static class PathComparator implements Comparator<string> {
  @Override
  public int compare(String o1, String o2) {
    if (o1 == null) {
      return o2 == null ? 0 : -1;
    } else if (o2 == null) {
      return 1;
    } else {
      int length1 = o1.length();
      int length2 = o2.length();
      if (length1 == length2) {
        return o1.compareTo(o2);
      } else {
        return length1 < length2 ? -1 : 1;
      }
    }
  }
}

public static void main(String args[]) {
  PriorityQueue<node> queue = new PriorityQueue<node>(WEIGHTS.size(),
      new WeightComparator());
  for (Map.Entry<string, integer> entry : WEIGHTS.entrySet()) {
    queue.add(new Node(entry.getKey(), entry.getValue()));
  }
  while (queue.size() > 1) {
    Node first = queue.poll();
    Node second = queue.poll();
    queue.add(new Node(first, second));
  }
  Map<string, node> nodes = new TreeMap<string, node>(new PathComparator());
  addLeaves(nodes, queue.peek(), &quot;&quot;);
  for (Map.Entry<string, node> entry : nodes.entrySet()) {
    System.out.printf("%s %s%n", entry.getKey(), entry.getValue().getLabel());
  }
}

public static void addLeaves(Map<string, node> nodes, Node node, String prefix) {
  if (node != null) {
    addLeaves(nodes, node.getLeft(), prefix + "0");
    addLeaves(nodes, node.getRight(), prefix + "1");
    if (node.isLeaf()) {
      nodes.put(prefix, node);
    }
  }
}

One possible output is:

         White    Black
Empty          0 
Pawn       110      100
Rook     11111    11110
Knight   10110    10101
Bishop   10100    11100
Queen   111010   111011
King    101110   101111

For a starting position this equates to 32 x 1 + 16 x 3 + 12 x 5 + 4 x 6 = 164 bits.

State Difference

Another possible approach is to combine the very first approach with Huffman coding. This is based on the assumption that most expected Chess boards (rather than randomly generated ones) are more likely than not to, at least in part, resemble a starting position.

So what you do is XOR the 256 bit current board position with a 256 bit starting position and then encode that (using Huffman coding or, say, some method of run length encoding). Obviously this will be very efficient to start with (64 0s probably corresponding to 64 bits) but increase in storage required as the game progresses.

Piece Position

As mentioned, another way of attacking this problem is to instead store the position of each piece a player has. This works particularly well with endgame positions where most squares will be empty (but in the Huffman coding approach empty squares only use 1 bit anyway).

Each side will have a king and 0-15 other pieces. Because of promotion the exact make up of those pieces can vary enough that you can’t assume the numbers based on the starting positions are maxima.

The logical way to divide this up is store a Position consisting of two Sides (White and Black). Each Side has:

A king: 6 bits for the location;
Has pawns: 1 (yes), 0 (no);
If yes, number of pawns: 3 bits (0-7+1 = 1-8);
If yes, the location of each pawn is encoded: 45 bits (see below);
Number of non-pawns: 4 bits (0-15);
For each piece: type (2 bits for queen, rook, knight, bishop) and location (6 bits)

As for the pawn location, the pawns can only be on 48 possible squares (not 64 like the others). As such, it is better not to waste the extra 16 values that using 6 bits per pawn would use. So if you have 8 pawns there are 48⁸ possibilities, equalling 28,179,280,429,056. You need 45 bits to encode that many values.

That’s 105 bits per side or 210 bits total. The starting position is the worst case for this method however and it will get substantially better as you remove pieces.

It should be pointed out that there are less than 48⁸ possibilities because the pawns can’t all be in the same square The first has 48 possibilities, the second 47 and so on. 48 x 47 x … x 41 = 1.52e13 = 44 bits storage.

You can further improve this by eliminating the squares that are occupied by other pieces (including the other side) so you could first place the white non-pawns then the black non-pawns, then the white pawns and lastly the black pawns. On a starting position this reduces the storage requirements to 44 bits for White and 42 bits for Black.

Combined Approaches

Another possible optimization is that each of these approaches has its strength and weaknesses. You could, say, pick the best 4 and then encode a scheme selector in the first two bits and then the scheme-specific storage after that.

With the overhead that small, this will by far be the best approach.

Game State

I return to the problem of storing a game rather than a position. Because of the threefold repetition we have to store the list of moves that have occurred to this point.

Annotations

One thing you have to determine is are you simply storing a list of moves or are you annotating the game? Chess games are often annotated, for example:

Bb5!! Nc4?

White’s move is marked by two exclamation points as brilliant whereas Black’s is viewed as a mistake. See Chess punctuation.

Additionally you could also need to store free text as the moves are described.

I am assuming that the moves are sufficient so there will be no annotations.

Algebraic Notation

We could simply store the the text of the move here (“e4”, “Bxb5”, etc). Including a terminating byte you’re looking at about 6 bytes (48 bits) per move (worst case). That’s not particularly efficient.

The second thing to try is to store the starting location (6 bits) and end location (6 bits) so 12 bits per move. That is significantly better.

Alternatively we can determine all the legal moves from the current position in a predictable and deterministic way and state which we’ve chosen. This then goes back to the variable base encoding mentioned above. White and Black have 20 possible moves each on their first move, more on the second and so on.

Conclusion

There is no absolutely right answer to this question. There are many possible approaches of which the above are just a few.

What I like about this and similar problems is that it demands abilities important to any programmer like considering the usage pattern, accurately determining requirements and thinking about corner cases.

Chess positions taken as screenshots from Chess Position Trainer.

It's best just to store chess games in a human-readable, standard format.

The Portable Game Notation assumes a standard starting position (although it doesn't have to) and just lists the moves, turn by turn. A compact, human-readable, standard format.

E.g.

[Event "F/S Return Match"]
[Site "Belgrade, Serbia Yugoslavia|JUG"]
[Date "1992.11.04"]
[Round "29"]
[White "Fischer, Robert J."]
[Black "Spassky, Boris V."]
[Result "1/2-1/2"]

1. e4 e5 2. Nf3 Nc6 3. Bb5 {This opening is called the Ruy Lopez.} 3... a6
4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3 O-O 9. h3 Nb8  10. d4 Nbd7
11. c4 c6 12. cxb5 axb5 13. Nc3 Bb7 14. Bg5 b4 15. Nb1 h6 16. Bh4 c5 17. dxe5
Nxe4 18. Bxe7 Qxe7 19. exd6 Qf6 20. Nbd2 Nxd6 21. Nc4 Nxc4 22. Bxc4 Nb6
23. Ne5 Rae8 24. Bxf7+ Rxf7 25. Nxf7 Rxe1+ 26. Qxe1 Kxf7 27. Qe3 Qg5 28. Qxg5
hxg5 29. b3 Ke6 30. a3 Kd6 31. axb4 cxb4 32. Ra5 Nd5 33. f3 Bc8 34. Kf2 Bf5
35. Ra7 g6 36. Ra6+ Kc5 37. Ke1 Nf4 38. g3 Nxh3 39. Kd2 Kb5 40. Rd6 Kc5 41. Ra6
Nf2 42. g4 Bd3 43. Re6 1/2-1/2

If you want to make it smaller, then just zip it. Job done!

Great puzzle!

I see that most people are storing the position of each piece. How about taking a more simple-minded approach, and storing the contents of each square? That takes care of promotion and captured pieces automatically.

And it allows for Huffman encoding. Actually, the initial frequency of pieces on the board is nearly perfect for this: half of the squares are empty, half of the remaining squares are pawns, etcetera.

Considering the frequency of each piece, I constructed a Huffman tree on paper, which I won't repeat here. The result, where c stands for the colour (white = 0, black = 1):

0 for empty squares
1c0 for pawn
1c100 for rook
1c101 for knight
1c110 for bishop
1c1110 for queen
1c1111 for king

For the entire board in its initial situation, we have

empty squares: 32 * 1 bit = 32 bits
pawns: 16 * 3 bits = 48 bits
rooks/knights/bishops: 12 * 5 bits = 60 bits
queens/kings: 4 * 6 bits = 24 bits

Total: 164 bits for the initial board state. Significantly less than the 235 bits of the currently highest voted answer. And it's only going to get smaller as the game progresses (except after a promotion).

I looked only at the position of the pieces on the board; additional state (whose turn, who has castled, en passant, repeating moves, etc.) will have to be encoded separately. Maybe another 16 bits at most, so 180 bits for the entire game state. Possible optimizations:

Leaving out the less frequent pieces, and storing their position separately. But that won't help... replacing king and queen by an empty square saves 5 bits, which are exactly the 5 bits you need to encode their position in another way.
"No pawns on the back row" could easily be encoded by using a different Huffman table for the back rows, but I doubt it helps much. You'd probably still end up with the same Huffman tree.
"One white, one black bishop" can be encoded by introducing extra symbols that don't have the c bit, which can then be deduced from the square that the bishop is on. (Pawns promoted to bishops disrupt this scheme...)
Repetitions of empty squares could be run-length encoded by introducing extra symbols for, say, "2 empty squares in a row" and "4 empty squares in a row". But it is not so easy to estimate the frequency of those, and if you get it wrong, it's going to hurt rather than help.

The really big lookup table approach

Position - 18 bytes
Estimated number of legal positions is 10⁴³
Simply enumerate them all and the position can be stored in just 143 bits. 1 more bit is required to indicate which side is to play next

The enumeration is not practical of course, but this shows that at least 144 bits are required.

Moves - 1 byte
There are usually around 30-40 legal moves for each position but the number may be as high as 218 Lets enumerate all the legal moves for each position. Now each move can be encoded into one byte.

We still have plenty of room for special moves such as 0xFF to represent resigning.

It'd add interest to optimize for average-case size for typical games played by humans, instead of the worst case. (The problem statement doesn't say which; most responses assume worst-case.)

For the move sequence, have a good chess engine generate moves from each position; it'll produce a list of k possible moves, ordered by its ranking of their quality. People generally pick good moves more often than random moves, so we need to learn a mapping from each position in the list to the probability that people pick a move that 'good'. Using these probabilities (based on a corpus of games from some internet chess database), encode the moves with arithmetic coding. (The decoder must use the same chess engine and mapping.)

For the starting position, ralu's approach would work. We could refine it with arithmetic coding there as well, if we had some way to weight the choices by probability — e.g. pieces often appear in configurations defending each other, not at random. It's harder to see an easy way to incorporate that knowledge. One idea: fall back on the above move encoding instead, starting from the standard opening position and finding a sequence that ends in the desired board. (You might try A* search with a heuristic distance equaling the sum of the distances of pieces from their final positions, or something along those lines.) This trades some inefficiency from overspecifying the move sequence vs. efficiency from taking advantage of chess-playing knowledge. (You can claw back some of the inefficiency by eliminating move choices that would lead to a previously-explored position in the A* search: these can get weight 0 in the arithmetic code.)

It's also kind of hard to estimate how much savings this would buy you in average-case complexity, without gathering some statistics from an actual corpus. But the starting point with all moves equally probable I think would already beat most of the proposals here: the arithmetic coding doesn't need an integer number of bits per move.

Attacking a subproblem of encoding the steps after an initial position has been encoded. The approach is to create a "linked list" of steps.

Each step in the game is encoded as the "old position->new position" pair. You know the initial position in the beginning of the chess game; by traversing the linked list of steps, you can get to the state after X moves.

For encoding each step, you need 64 values to encode the starting position (6 bits for 64 squares on the board - 8x8 squares), and 6 bits for the end position. 16 bits for 1 move of each side.

Amount of space that encoding a given game would take is then proportionate to the number of moves:

10 x (number of white moves + number of black moves) bits.

UPDATE: potential complication with promoted pawns. Need to be able to state what the pawn is promoted to - may need special bits (would use gray code for this to save space, as pawn promotion is extremely rare).

UPDATE 2: You don't have to encode the end position's full coordinates. In most cases, the piece that's being moved can move to no more than X places. For example, a pawn can have a maximum of 3 move options at any given point. By realizing that maximum number of moves for each piece type, we can save bits on the encoding of the "destination".

Pawn: 
   - 2 options for movement (e2e3 or e2e4) + 2 options for taking = 4 options to encode
   - 12 options for promotions - 4 promotions (knight, biship, rook, queen) times 3 squares (because you can take a piece on the last row and promote the pawn at the same time)
   - Total of 16 options, 4 bits
Knight: 8 options, 3 bits
Bishop: 4 bits
Rook: 4 bits
King: 3 bits
Queen: 5 bits

So the spatial complexity per move of black or white becomes

6 bits for the initial position + (variable number of bits based upon the type of the thing that's moved).

Related questions
                            
                                Algorithm: efficient way to remove duplicate integers from an array
                            
                                What's a good algorithm to determine if an input is a perfect square? [duplicate]
                            
                                Finding index of nearest point in numpy arrays of x and y coordinates
                            
                                How to convert a byte array to its numeric value (Java)?
                            
                                Find common substring between two strings
                            
                                What's the algorithm to calculate aspect ratio?
                            
                                An algorithm to space out overlapping rectangles?
                            
                                Is the time-complexity of iterative string append actually O(n^2), or O(n)?
                            
                                Algorithm to find all Latitude Longitude locations within a certain distance from a given Lat Lng location
                            
                                Most efficient way to store thousand telephone numbers
                            
                                Insertion sort vs Bubble Sort Algorithms
                            
                                A simple example for someone who wants to understand Dynamic Programming [closed]
                            
                                Calculating the position of points in a circle
                            
                                Calculate mean and standard deviation from a vector of samples in C++ using Boost
                            
                                Understanding Time complexity calculation for Dijkstra Algorithm
                            
                                Why DFS and not BFS for finding cycle in graphs
                            
                                Tetris-ing an array
                            
                                Looking for a good world map generation algorithm [closed]
                            
                                What is the time complexity of my function? [duplicate]
                            
                                How can I measure the similarity between two images? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With