Fast Sudoku Solver in Haskell #2: A 200x Faster Solution

In the first part of this series of posts, we wrote a simple Sudoku solver in Haskell. It used a constraint satisfaction algorithm with backtracking. The solution worked well but was very slow. In this post, we are going to improve it and make it fast.

This is the second post in a series of posts:

  1. Fast Sudoku Solver in Haskell #1: A Simple Solution
  2. Fast Sudoku Solver in Haskell #2: A 200x Faster Solution
  3. Fast Sudoku Solver in Haskell #3: Picking the Right Data Structures

Discuss this post on r/haskell.

Quick Recap

Sudoku is a number placement puzzle. It consists of a 9x9 grid which is to be filled with digits from 1 to 9 such that each row, each column and each of the nine 3x3 sub-grids contain all the digits. Some of the cells of the grid come pre-filled and the player has to fill the rest.

In the previous post, we implemented a simple Sudoku solver without paying much attention to its performance characteristics. We ran1 some of 17-clue puzzles2 through our program to see how fast it was:

$ head -n100 sudoku17.txt | time stack exec sudoku
... output omitted ...
      116.70 real       198.09 user        94.46 sys

So, it took about 117 seconds to solve one hundred puzzles. At this speed, it would take about 16 hours to solve all the 49151 puzzles contained in the file. This is way too slow. We need to find ways to make it faster. Let’s go back to the drawing board.

Constraints and Corollaries

In a Sudoku puzzle, we have a partially filled 9x9 grid which we have to fill completely while following the constraints of the game.

+-------+-------+-------+
| . . . | . . . | . 1 . |
| 4 . . | . . . | . . . |
| . 2 . | . . . | . . . |
+-------+-------+-------+
| . . . | . 5 . | 4 . 7 |
| . . 8 | . . . | 3 . . |
| . . 1 | . 9 . | . . . |
+-------+-------+-------+
| 3 . . | 4 . . | 2 . . |
| . 5 . | 1 . . | . . . |
| . . . | 8 . 6 | . . . |
+-------+-------+-------+
    A sample puzzle

+-------+-------+-------+
| 6 9 3 | 7 8 4 | 5 1 2 |
| 4 8 7 | 5 1 2 | 9 3 6 |
| 1 2 5 | 9 6 3 | 8 7 4 |
+-------+-------+-------+
| 9 3 2 | 6 5 1 | 4 8 7 |
| 5 6 8 | 2 4 7 | 3 9 1 |
| 7 4 1 | 3 9 8 | 6 2 5 |
+-------+-------+-------+
| 3 1 9 | 4 7 5 | 2 6 8 |
| 8 5 6 | 1 2 9 | 7 4 3 |
| 2 7 4 | 8 3 6 | 1 5 9 |
+-------+-------+-------+
    and its solution

Earlier, we followed a simple pruning algorithm which removed all the solved (or fixed) digits from neighbours of the fixed cells. We repeated the pruning till the fixed and non-fixed values in the grid stopped changing (or the grid settled). Here’s an example of a grid before pruning:

+-------------------------------------+-------------------------------------+-------------------------------------+
| [123456789] [123456789] [123456789] | [123456789] [123456789] [123456789] | [123456789] 1           [123456789] |
| 4           [123456789] [123456789] | [123456789] [123456789] [123456789] | [123456789] [123456789] [123456789] |
| [123456789] 2           [123456789] | [123456789] [123456789] [123456789] | [123456789] [123456789] [123456789] |
+-------------------------------------+-------------------------------------+-------------------------------------+
| [123456789] [123456789] [123456789] | [123456789] 5           [123456789] | 4           [123456789] 7           |
| [123456789] [123456789] 8           | [123456789] [123456789] [123456789] | 3           [123456789] [123456789] |
| [123456789] [123456789] 1           | [123456789] 9           [123456789] | [123456789] [123456789] [123456789] |
+-------------------------------------+-------------------------------------+-------------------------------------+
| 3           [123456789] [123456789] | 4           [123456789] [123456789] | 2           [123456789] [123456789] |
| [123456789] 5           [123456789] | 1           [123456789] [123456789] | [123456789] [123456789] [123456789] |
| [123456789] [123456789] [123456789] | 8           [123456789] 6           | [123456789] [123456789] [123456789] |
+-------------------------------------+-------------------------------------+-------------------------------------+

And here’s the same grid when it settles after repeated pruning:

+-------------------------------------+-------------------------------------+-------------------------------------+
| [    56789] [  3  6789] [  3 567 9] | [ 23 567 9] [ 234 6 8 ] [ 2345 789] | [    56789] 1           [ 23456 89] |
| 4           [1 3  6789] [  3 567 9] | [ 23 567 9] [123  6 8 ] [123 5 789] | [    56789] [ 23 56789] [ 23 56 89] |
| [1   56789] 2           [  3 567 9] | [  3 567 9] [1 34 6 8 ] [1 345 789] | [    56789] [  3456789] [  3456 89] |
+-------------------------------------+-------------------------------------+-------------------------------------+
| [ 2   6  9] [  3  6  9] [ 23  6  9] | [ 23  6   ] 5           [123    8 ] | 4           [ 2   6 89] 7           |
| [ 2  567 9] [   4 67 9] 8           | [ 2   67  ] [12 4 6   ] [12 4  7  ] | 3           [ 2  56  9] [12  56  9] |
| [ 2  567  ] [  34 67  ] 1           | [ 23  67  ] 9           [ 234  78 ] | [    56 8 ] [ 2  56 8 ] [ 2  56 8 ] |
+-------------------------------------+-------------------------------------+-------------------------------------+
| 3           [1    6 89] [     6  9] | 4           7           [    5   9] | 2           [    56 89] [1   56 89] |
| [ 2   6789] 5           [ 2 4 67 9] | 1           [ 23      ] [ 23     9] | [     6789] [  34 6789] [  34 6 89] |
| [12    7 9] [1  4  7 9] [ 2 4  7 9] | 8           [ 23      ] 6           | [1   5 7 9] [  345 7 9] [1 345   9] |
+-------------------------------------+-------------------------------------+-------------------------------------+

We see how the possibilities conflicting with the fixed values are removed. We also see how some of the non-fixed cells turn into fixed ones as all their other possible values get eliminated.

This simple strategy follows directly from the constraints of Sudoku. But, are there more complex strategies which are implied indirectly?

Singles, Twins and Triplets

Let’s have a look at this sample row captured from a solution in progress:

+-------------------------------------+-------------------------------------+-------------------------------------+
| 4           [ 2   6 89] 7           | 3           [ 2  56  9] [12  56  9] | [    56 8 ] [ 2  56 8 ] [ 2  56 8 ] |
+-------------------------------------+-------------------------------------+-------------------------------------+

Notice how the sixth cell is the only one with 1 as a possibility in it. It is obvious that we should fix the sixth cell to 1 as we cannot place 1 in any other cell in the row. Let’s call this the Singles3 scenario.

But, our current solution will not fix the sixth cell to 1 till one of these cases arise:

  1. all other possibilities of the cell are pruned away, or,
  2. the cell is chosen as pivot in the nextGrids function and 1 is chosen as the value to fix.

This may take very long and lead to a longer solution time. Let’s assume that we recognize the Singles scenario while pruning cells and fix the cell to 1 right then. That would cut down the search tree by a lot and make the solution much faster.

It turns out, we can generalize this pattern. Let’s check out this sample row from middle of a solution:

+-------------------------------------+-------------------------------------+-------------------------------------+
| [1  4    9] 3           [1  4567 9] | [1  4   89] [1  4 6 89] [1  4 6 89] | [1  4   89] 2           [1  456789] |
+-------------------------------------+-------------------------------------+-------------------------------------+

It is a bit difficult to notice with the naked eye but there’s something special here too. The digits 5 and 7 occur only in the third and the ninth cells. Though they are accompanied by other digits in those cells, they are not present in any other cells. This means, we can place 5 and 7 either in the third or the ninth cell and no other cells. This implies that we can prune the third and ninth cells to have only 5 and 7 like this:

+-------------------------------------+-------------------------------------+-------------------------------------+
| [1  4    9] 3           [    5 7  ] | [1  4   89] [1  4 6 89] [1  4 6 89] | [1  4   89] 2           [    5 7  ] |
+-------------------------------------+-------------------------------------+-------------------------------------+

This is the Twins scenario. As we can imagine, this pattern extends to groups of three digits and beyond. When three digits can be found only in three cells in a block, it is the Triplets scenario, as in the example below:

+-------------------------------------+-------------------------------------+-------------------------------------+
| [   45 7  ] [   45 7  ] [    5 7  ] | 2           [  3 5  89] 6           | 1           [  34   89] [  34   89] |
+-------------------------------------+-------------------------------------+-------------------------------------+

In this case, the triplet digits are 3, 8 and 9. And as before, we can prune the block by fixing these digits in their cells:

+-------------------------------------+-------------------------------------+-------------------------------------+
| [   45 7  ] [   45 7  ] [    5 7  ] | 2           [  3    89] 6           | 1           [  3    89] [  3    89] |
+-------------------------------------+-------------------------------------+-------------------------------------+

Let’s call these three scenarios Exclusives in general.

We can extend this to Quadruplets scenario and further. But such scenarios occur rarely in a 9x9 Sudoku puzzle. Trying to find them may end up being more computationally expensive than the benefit we may get in solution time speedup by finding them.

Now that we have discovered these new strategies to prune cells, let’s implement them in Haskell.

A Little Forward, a Little Backward

We can implement the three new strategies to prune cells as one function for each. However, we can actually implement all these strategies in a single function. But, this function is a bit more complex than the previous pruning function. So first, let’s try to understand its working using tables. Let’s take this sample row:

+-------------------------------------+-------------------------------------+-------------------------------------+
| [   4 6  9] 1           5           | [     6  9] 7           [ 23  6 89] | [     6  9] [ 23  6 89] [ 23  6 89] |
+-------------------------------------+-------------------------------------+-------------------------------------+

First, we make a table mapping the digits to the cells in which they occur, excluding the fixed cells:

Digit Cells
2 6, 8, 9
3 6, 8, 9
4 1
6 1, 4, 6, 7, 8, 9
8 6, 8, 9
9 1, 4, 6, 7, 8, 9

Then, we flip this table and collect all the digits that occur in the same set of cells:

Cells Digits
1 4
6, 8, 9 2, 3, 8
1, 4, 6, 7, 8, 9 6, 9

And finally, we remove the rows of the table in which the count of the cells is not the same as the count of the digits:

Cells Digits
1 4
6, 8, 9 2, 3, 8

Voilà! We have found a Single 4 and a set of Triplets 2, 3 and 8. You can go over the puzzle row and verify that this indeed is the case.

Translating this logic to Haskell is quite easy now:

We extract the isPossible function to the top level from the nextGrids function for reuse. Then we write the exclusivePossibilities function which finds the Exclusives in the input row. This function is written using the reverse application operator (&)4 instead of the usual ($) operator so that we can read it from top to bottom. We also show the intermediate values for a sample input after every step in the function chain.

The nub of the function lies in step 3 (pun intended). We do a nested fold over all the non-fixed cells and all the possible digits in them to compute the map5 which represents the first table. Thereafter, we filter the map to keep only the entries with length less than four (step 4). Then we flip it to create a new map which represents the second table (step 5). Finally, we filter the flipped map for the entries where the cell count is same as the digit count (step 6) to arrive at the final table. The step 7 just gets the values in the map which is the list of all the Exclusives in the input row.

Pruning the Cells, Exclusively

To start with, we extract some reusable code from the previous pruneCells function and rename it to pruneCellsByFixed:

Now we write the pruneCellsByExclusives function which uses the exclusivePossibilities function to prune the cells:

pruneCellsByExclusives works exactly as shown in the examples above. We first find the list of Exclusives in the given cells. If there are no Exclusives, there’s nothing to do and we just return the cells. If we find any Exclusives, we traverse the cells, pruning each cell to only the intersection of the possible digits in the cell and Exclusive digits. That’s it! We reuse the makeCell function to create a new cell with the intersection.

As the final step, we rewrite the pruneCells function by combining both the functions.

We have extracted fixM as a top level function from the pruneGrid function. Just like the pruneGrid' function, we need to use monadic bind (>>=) to chain the two pruning steps. We also use fixM to apply each step repeatedly till the pruned cells settle6.

No further code changes are required. It is time to check out the improvements.

Faster than a Speeding Bullet!

Let’s build the program and run the exact same number of puzzles as before:

$ head -n100 sudoku17.txt | time stack exec sudoku
... output omitted ...
      0.53 real         0.58 user         0.23 sys

Woah! It is way faster than before. Let’s solve all the puzzles now:

$ cat sudoku17.txt | time stack exec sudoku > /dev/null
      282.98 real       407.25 user       109.27 sys

So it is took about 283 seconds to solve all the 49151 puzzles. The speedup is about 200x7. That’s about 5.8 milliseconds per puzzle.

Let’s do a quick profiling to see where the time is going:

$ stack build --profile
$ head -n1000 sudoku17.txt | stack exec -- sudoku +RTS -p > /dev/null

This generates a file named sudoku.prof with the profiling results. Here are the top five most time-taking functions (cleaned for brevity):

Cost Center Source %time %alloc
exclusivePossibilities (49,1)-(62,26) 17.6 11.4
pruneCellsByFixed.pruneCell (75,5)-(76,36) 16.9 30.8
exclusivePossibilities.\.\ 55:38-70 12.2 20.3
fixM.\ 13:27-65 10.0 0.0
== 15:56-57 7.2 0.0

Looking at the report, my guess is that a lot of time is going into list operations. Lists are known to be inefficient in Haskell so maybe we should switch to some other data structures?

Update

As per the comment below by Chris Casinghino, I ran both the versions of code without the -threaded, -rtsopts and -with-rtsopts=-N options. The time for previous post’s code:

$ head -n100 sudoku17.txt | time stack exec sudoku
... output omitted ...
       96.54 real        95.90 user         0.66 sys

And the time for this post’s code:

$ cat sudoku17.txt | time stack exec sudoku > /dev/null
      258.97 real       257.34 user         1.52 sys

So, both the versions run about 10% faster without the threading options. I suspect this has something to do with GHC’s parallel GC as described in this post. So for now, I’ll keep threading disabled.

Conclusion

In this post, we improved upon our simple Sudoku solution from the last time. We discovered and implemented a new strategy to prune cells, and we achieved a 200x speedup. But profiling shows that we still have many possibilities for improvements. We’ll work on that and more in the upcoming posts in this series. The code till now is available here. Discuss this post on r/haskell or leave a comment.


  1. All the runs were done on my MacBook Pro from 2014 with 2.2 GHz Intel Core i7 CPU and 16 GB memory.

  2. At least 17 cells must be pre-filled in a Sudoku puzzle for it to have a unique solution. So 17-clue puzzles are the most difficult of all puzzles. This paper by McGuire, Tugemann and Civario gives the proof of the same.

  3. “Single” as in “Single child”

  4. Reverse application operation is not used much in Haskell. But it is the preferred way of function chaining in some other functional programming languages like Clojure, FSharp, and Elixir.

  5. We use Data.Map.Strict as the map imple­mentation.

  6. We need to run pruneCellsByFixed and pruneCellsByExclusives repeatedly using fixM because an unsettled row can lead to wrong solutions.

    Imagine a row which just got a 9 fixed because of pruneCellsByFixed. If we don’t run the function again, the row may be left with one non-fixed cell with a 9. When we run this row through pruneCellsByExclusives, it’ll consider the 9 in the non-fixed cell as a Single and fix it. This will lead to two 9s in the same row, causing the solution to fail.

  7. Speedup calculation: 116.7 / 100 * 49151 / 282.98 = 202.7

Posted by Post history
Cancel Reply

Got suggestions, corrections, or thoughts? Post a comment!

Markdown is allowed
Email is used just to show an avatar image and is not displayed or stored
Comments are moderated. They will appear below after they are approved.

10 comments

Chris Casinghino

July 11, 2018

Out of curiosity, what kind of machine did you use for your experiments? I just ran your code on a 32 core Xeon E5-2630 and it took about 3x as long.

Chris Casinghino

July 11, 2018

(To be clear, I said “32 cores”, but should have said “with 32 hardware threads”)

July 11, 2018

@Chris, I updated the post with the info. It is a MacBook Pro from 2014 with 2.2 GHz Intel Core i7 CPU and 16 GB memory. The program is single threaded so number of cores doesn’t really matter.

Chris Casinghino

July 11, 2018

@Abhinav On my desktop, the program is using all the hardware threads (I watched it in htop). This definitely surprised me too, since there’s no parallelism explicit in the source. Perhaps the -threaded option in the package.yaml file is causing GHC to do some smart automated threading around the forM_?

Chris Casinghino

July 11, 2018

Additional interesting facts: If I remove the “-threaded”, “-rtsopts” and “-with-rtsopts=-N” from the package.yaml file, the program runs on a single thread. It actually finishes faster (about twice as fast) than it did when it used all 32 threads. It’s still a bit slower for me than it was for you, but now reasonably so.

July 11, 2018

@Chris, you are right that the program is threaded. The usage in htop could be because of GHC’s parallel GC kicking in. This is an interesting point. I’ll disable the parallel GC and see if it impacts the performance.

Brian Jones

July 12, 2018

Sometimes the context switching when threading is turned on adds more overhead that the single threaded version itself, which is probably why you are seeing the slow down.

Dαvid

July 11, 2018

Thanks Abhinav, I like your posts and wish you will continue make things like that

Samuel Chase

July 12, 2018

Excellent post Abhinav, I attempted to write a sudoku solver once, but I was not successful. You inspire to me to tackle this problem again.

August 20, 2018

Thanks for the encouragement Samuel. I hope you’ll write a post about your Sudoku solver too :-)

5 mentions