Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing the backtracking algorithm solving Sudoku

I'm hoping to optimize my backtracking algorithm for my Sudoku Solver.


What it does now:

The recursive solver function takes a sudoku puzzle with various given values.

I will scour through all the empty slots in the puzzle, looking for the slot that has the least possibilities, and get the list of values.

From the list of values, I will loop through it by placing one of the values from the list in the slot, and recursively solve it, until the entire grid is filled.


This implementation still takes incredibly long for some puzzles and I hope to further optimize this. Does anyone have any ideas how I might be able to further optimize this?


Here's my code in Java if you're interested.

public int[][] Solve(int[][] slots) {
    // recursive solve v2 : optimization revision

    int[] least = new int[3];
    least[2] = Integer.MAX_VALUE;
    PuzzleGenerator value_generator = new PuzzleGenerator();
    LinkedList<Integer> least_values = null;

    // 1: find a slot with the least possible solutions
    // 2: recursively solve.

    // 1 - scour through all slots.
    int i = 0;
    int j = 0;
    while (i < 9) {
        j = 0;
        while (j < 9) {
            if (slots[i][j] == 0) {
                int[] grid_posi = { i, j };
                LinkedList<Integer> possible_values = value_generator
                        .possibleValuesInGrid(grid_posi, slots);
                if ((possible_values.size() < least[2])
                        && (possible_values.size() != 0)) {
                    least[0] = i;
                    least[1] = j;
                    least[2] = possible_values.size();
                    least_values = possible_values;
                }
            }
            j++;
        }
        i++;
    }

    // 2 - work on the slot
    if (least_values != null) {
        for (int x : least_values) {
            int[][] tempslot = new int[9][9];
            ArrayDeepCopy(slots, tempslot);
            tempslot[least[0]][least[1]] = x;

            /*ConsoleInterface printer = new gameplay.ConsoleInterface();
            printer.printGrid(tempslot);*/

            int[][] possible_sltn = Solve(tempslot);
            if (noEmptySlots(possible_sltn)) {
                System.out.println("Solved");
                return possible_sltn;
            }
        }
    }
    if (this.noEmptySlots(slots)) {
        System.out.println("Solved");
        return slots;
    }
    slots[0][0] = 0;
    return slots;
}
like image 845
nubela Avatar asked Oct 05 '09 05:10

nubela


People also ask

How the backtracking approach is working to solve the Sudoku problem?

A Sudoku (top) being solved by backtracking. Each cell is tested for a valid number, moving "back" when there is a violation, and moving forward again until the puzzle is solved. A Sudoku designed to work against the brute force algorithm.

Is backtracking optimized?

Backtracking Search Optimization Algorithm (BSA) ] is one of the new population based evolutionary algorithms. It is based on an iterative process which tries to minimize the objective function. BSA consists of five evolutionary mechanisms: initialization, selection-I, mutation, crossover, and selection-II.

Why backtracking is used in Sudoku?

Backtracking is a technique to solve problems where multiple choices are there and we don't know the correct choice and hence we solve problem with trial and error i.e. trying each option until goal is achieved.


2 Answers

I had an assignment to do just that: build the fastest sudoku solver in Java. I ended up winning the contest with a time of 0.3 millisecond.

I didn't use the dancing links algorithm and didn't compare with it, but some contestants must have tried it, yet my closest competitor took about 15 milliseconds.

I simply used a recursive backtracking algorithm, augmented it with 4 "rules" (which made backtracking unnecessary for almost every puzzle) and kept a bit field as a list of legal values for each position.

I wrote a blog post about it : http://byteauthor.com/2010/08/sudoku-solver/

And posted the code here : https://github.com/stonkie/SudokuSolverV1

like image 80
Kevin Coulombe Avatar answered Oct 14 '22 19:10

Kevin Coulombe


I recently wrote a program in Python that can solve a Sudoku puzzle. It is basically a backtracking algorithm which brute forces the search space. I have posted more details on the actual algorithm in this thread.

Here however I would like to focus more on the optimization process. To be more precise, I have explored different approaches to minimize the solve time and the number of iterations. And this is more about the algorithmic improvements that can be made, rather than the programming ones.

So having thought about it, there aren't many things in a backtracking brute force algorithm that can be optimized (happy to be proven wrong here). The two real improvements that can be made are: first, the method by which the next blank cell is chosen and second, the method by which the next possible digit is chosen. These two choices can make the difference between going down a dead-end searching path or going down a searching path that ends with a solution.

Next, I sat down and tried to come up with different methods for the aforementioned two choices. Here's what I came up with.

The next blank cell can be chosen in the following ways:

  • A - the first cell from left to right, from top to bottom
  • B - the first cell from right to left, from bottom to top
  • C - a randomly chosen cell
  • D - the closest cell to the center of the grid
  • E - the cell that currently has the fewest choices available (choice here means a digit from 1 to 9)
  • F - the cell that currently has the most choices available
  • G - the cell that has the fewest blank related cells (a related cells is one from the same row, from the same column or from the same 3x3 quadrant)
  • H - the cell that has the most blank related cells
  • I - the cell that is closest to all filled cells (as measured from cell center point to cell center point)
  • J - the cell that is furthest from all filled cells
  • K - the cell whose related blank cells have the fewest available choices
  • L - the cell whose related blank cells have the most available choices

And the next digit can be chosen in the following ways:

  • 0 - the lowest digit
  • 1 - the highest digit
  • 2 - a randomly chosen digit
  • 3 - heuristically, the least used digit across the board
  • 4 - heuristically, the most used digit across the board
  • 5 - the digit that will cause related blank cells to have the least number of choices available
  • 6 - the digit that will cause related blank cells to have the most number of choices available
  • 7 - the digit that is the least common available choice among related blank cells
  • 8 - the digit that is the most common available choice among related blank cells
  • 9 - the digit that is the least common available choice across the board
  • a - the digit that is the most common available choice across the board

So I have programmed the above methods into the program. The preceding digits and letters can be passed as parameters to the program and it will use the corresponding optimization method. What's more, because sometimes two and more cells can have the same score, there is an option to provide a second sorting parameter. For example parameter "EC" would mean choose a random cell from all the cells that have the fewest choices available.

The first function will assign weights multiplied by 1000 and the second function will add new weights multiplied by 1. Thus, if for example from the first function three cells have the same weight, e.g. 3000, 3000 3000, then the second function will add its own weights. e.g. 3111, 3256, 3025. The sorting will always choose the lowest weight. And if the opposite is needed, then the weight functions are called with -1000 amd -1, but the sorting still chooses the lowest weight.

Before proceeding it's worth mentioning that the program will always choose a blank cell (not a filled one) and will always choose a digit that is within the current Sudoku constraints of the cell (doing otherwise is just so unreasonable).

Having the above, I then decided to run the program with every possible combination of parameters and see what happens, which ones perform the best - basically to brute force the brute force :) There are 12 methods for cell choosing and 11 methods for digit choosing so in theory there are 17,424 combinations to try, but I removed some unnecessary ones (such as "AA", "BB", etc., and also excluded the random methods as they are all terribly inefficient), so the number of combinations in the end was 12,100. Each run was done on the same Sudoku puzzle, which is an easy one:

0,3,0,0,9,0,6,1,0
6,0,8,5,0,3,4,9,7
0,9,0,6,7,0,0,0,3
0,5,0,8,0,4,0,0,1
1,6,0,3,0,0,9,8,2
0,0,2,9,6,0,3,0,0
0,8,0,1,3,0,2,0,6
3,0,5,0,4,6,0,7,9
0,4,6,0,8,0,1,0,0

...and the search space is 36,691,771,392. This is just a simple product of the number of choices for each blank cell of the given puzzle. It is an overstatement because as soon as one cell is filled this reduces the number of choices for other cells, but it is the fastest and easiest score I could come up with.

I wrote a short script (in Python of course) that automated the whole process of testing - it ran the solver for each set of parameters, recorded the completion time and dumped everything into a file. Also, I decided to do 20 runs of each because I was getting some 0 times from the time.time() function for single runs. And also, if any combination took more than 10 seconds to complete, the script would stop and move to the next one.

The script completed in 13:04:31 hours on a laptop with Intel Core i7-4712MQ 2.30GHz, no more than 2 out of 8 cores were being used and the average CPU load was about 12%. 8,652 out of the 12,100 combinations completed in under 10 seconds.

And the winners are: (* numbers adjusted back for single run times/iterations)

1) Fastest time of 1.55 ms: "A0" and "A1" with 84 iterations and 46 backtrack iterations and "B0", "B01", "B1", "B10", "BA01", "BA1", "BD01", "BD1" and "BD10" with 65 iterations and 27 backtrack iterations The fastest methods are the simplest ones like A, B and D. Another method does not appear until ranking position 308, and that is "E0".

2) Fewest iterations of 38 and 0 backtrack iterations: Surprisingly many methods managed to achieve this, the fastest ones being "B17", "B6", "B7", "BA16", "BA60", "BA7", "BD17" and "BD70" with time of 2.3 ms and the slowest ones being "IK91", "JK91", "KI91", "KJ91", "KJ9a", "IK9a", "JK9a" and "KI9a" with time of about 107 ms. Also surprisingly, method F has a few good positions here such as "FB6" with 7 ms (???)

Overall A, B, D, E, G and K seemed to perform significantly better than C, F, H, and L, and I and J are kinda in between. Also, the choice of digit didn't seem to matter much.

And finally, let's see how these winner methods handle the world's hardest Sudoku puzzle, as claimed by this article http://www.telegraph.co.uk/news/science/science-news/9359579/Worlds-hardest-sudoku-can-you-crack-it.html * Having in mind that algorithms are not universally fast, maybe some algorithms do better on some Sudoku puzzles, but not on others... The puzzle is:

8,0,0,0,0,0,0,0,0
0,0,3,6,0,0,0,0,0
0,7,0,0,9,0,2,0,0
0,5,0,0,0,7,0,0,0
0,0,0,0,4,5,7,0,0
0,0,0,1,0,0,0,3,0
0,0,1,0,0,0,0,6,8
0,0,8,5,0,0,0,1,0
0,9,0,0,0,0,4,0,0

...and the search space is 95,865,912,019,648,512 x 10^20.

The winner is "A0" finishing in 1092 ms with 49,559 iterations and 49,498 backtrack iterations. Most of the other ones didn't do very well. "A0", "A1", "B0", "B01", "B1", "B10", "BA01", "BA1", "BD01', "BD1" and "BD10" finished in about 2500 ms and 91k iterations, and the rest 30+ seconds, 400k+ iterations.

But that's not enough so I ran a full test of all set of parameters for the hardest Sudoku too. This time doing a single run not 20, and also a cut-off time of 2.5 seconds. The script completed in 8:23:30 hours. 149 out of the 12,100 combinations completed in under 2.5 seconds. The winners in both categories are "E36", "E37", "EA36" and "EA37" with time of 109 ms, 362 iterations and 301 backtrack iterations. Also, the first 38 positions were dominated with by a beginning "E".

Overall E tops the charts, no doubt about it just by looking at the summary spreadsheet. A, B, I and J have a few rankings but nothing much and the rest did not even make it once under 2.5 seconds.

In conclusion, I think it is safe to say that if the Sudoku puzzle is an easy one then brute force it with the least complex algorithm, but if the Sudoku puzzle is a hard one then it's worth spending the overhead of the choosing methods.

Hope this helps :)

like image 25
svinec Avatar answered Oct 14 '22 20:10

svinec