Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sudoku backtracking algorithm

First of all, I'll state that this is a university assignment so I'm not asking for someone to write the code for me I just need to be pointed in the right direction. :)

Ok, so I need to write an algorithm to solve any (solvable) sudoku board of arbitrary size. I've written a recursive function that can solve any 9x9 board quickly (~1ms) but when I do larger boards (16x16) that are hard to solve it struggles.. I've had one test going for 20 minutes and it can't seem to solve it. It can solve easy 16x16 puzzles or even a blank 16x16 board so I don't think it's the dimensions that are the problem.. it's more likely to be the algorithm that is the problem I think.

Anyway, this is the basic logic of my program..

  • I have a 3D vector that stores the possible values of my every square
  • When a value is placed in a square then it is removed from the possible values of the surrounding square, row and column that it's in

Then my solve function is basically:

bool solve() {

    if (there are no unfilled squares)
        return true

    if (the board is unsolvable - there are empty squares that have no possible values)
        return false

    while (there are empty squares)
    {
        int squaresFilled = fillSquaresWithOnlyOneChoice(); //this method updates the possible results vector whenever it fills a square

        if (squaresFilled == 0)
            break;
    }

    //exhausted all of the 'easy' squares (squares with only one possible choice), need to make a guess

    while (there are empty squares that have choices left) {

        find the square with the least number of choices

        if (the square with the least number of choices has 0 choices)
            return false; //not solvable.

        remove that choice from the 3D vector (vector that has the choices for each square)
        make a copy of the board and the 3D choices vector

        fill the square with the choice

        if (solve())
            return true; //we're done

        restore the board and choices vector 
        //the guess didn't work so keep looping and make a new guess with the restored board and choices -- the choice we just made has been removed though so it won't get made again.

    }

    return false; //can't go any further
}

Is there anything inefficient about this? Is there any way I could get it to work better? I'm guessing that a 16x16 board takes so long is because the decision tree for it is so large for a board that isn't filled in very much. It's weird though, because a 9x9 board will solve really fast.

Any ideas or suggestions would be absolutely awesome. If there's any information I've missed let me know too!

like image 633
Sam Avatar asked Oct 08 '11 09:10

Sam


2 Answers

Fast algorhitm for solving sudoku is Algorithm X by Donald Knuth. You represent solving sudoku as exact cover problem and then use Algorithm X for solving EC problem. Then use DLX as efficient implementation of Algorithm X.

There is great explanation on wikipedia on how to apply exact cover for solving sudoku.

I can tell you that DLX is extremely fast fost solving sudoku in is commonly used in fastest algorhitm.

http://www.setbb.com/phpbb/index.php?mforum=sudoku is great forum whit probably best sudoku programmers.

like image 127
Luka Rahne Avatar answered Sep 21 '22 14:09

Luka Rahne


Between filling the squares with only one choice and going full recursive on the board there are more advanced actions you can do. Lets take that "region" is one row, or one column, or one square region (3x3 or 4x4).

Tactic 1

If there are K squares in a region that can take only identical K numbers (for instance two squares that can take only 2 an 5, or three squares that can take only 1, 7 and 8) then all other squares in that region can't take those specific numbers. You need to iterate each region to weed out "taken" numbers, so you can find a square with only one logical choice (for instance third square with 2, 4 and 5 logically can take only 4, or fourth square with 1, 3, 7 and 8 logically can take only 3).

This has to bi solved with iteration if you consider the following example. A region has squares with this possible numbers:

A: 1 2 3
B: 2 3
C: 2 3 4 5
D: 4 5
E: 4 5

The algorithm should detect that squares D and E hold numbers 4 and 5, so 4 and 5 are excluded from other squares in the region. The algorithm then detects that squares B and C hold numbers 2 and 3, and so excludes them from other squares. This leaves square A with only number 1.

Tactic 2

If a number occurs in the region in only one square then logically that square holds that number.

Tactic 3

Tactics 1 and 2 are only special cases of Tactic 3 having K squares with only K identical numbers. You can have K squares and a set of K numbers and those K squares can hold any subset of those K numbers. Consider the following example of a region:

A: 1 2
B: 2 3
C: 1 3
D: 1 2 3 4

Squares A, B and C can hold only numbers 1, 2 and 3. That's K for K. That means that any other square can't logically hold these numbers, which leaves square D with only number 4.

Tactic 2 is special case of Tactic 3 when K = N - 1.

Tactic 4

Take advantage of regions overlap. Suppose that some number can exist only in certain squares of the region. If all those squares belong to another overlapping region then that number should be excluded from all other squares in this other region.

Tactic 5

Cache results. All regions should have a "dirty" flag that denotes that something in the region has changed from the last time the region is processed. You don't have to process the region with this flag not set.


Human beings use all those tactics, and really hate to guess a number, because backtracking is a real pain. Actually, the difficulty of a board is measured with the minimum number of guesses one has to make to solve the board. For most "extreme" boards one good guess is enough.

like image 23
Dialecticus Avatar answered Sep 24 '22 14:09

Dialecticus