Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this Haskell code so slow?

I'm kind of new to Haskell and tried making a scrabble solver. It takes in the letters you currently have, finds all permutations of them and filters out those that are dictionary words. The code's pretty simple:

import Data.List

main = do
    dict    <- readFile "words"
    letters <- getLine
    let dictWords = words dict
    let perms = permutations letters
    print [x | x <- perms, x `elem` dictWords]

However it's incredibly slow, compared to a very similar implementation I have with Python. Is there something fundamental I'm doing wrong?

*edit: Here's my Python code:

from itertools import permutations

letters = raw_input("please enter your letters (without spaces): ")

d = open('words')
dictionary = [line.rstrip('\n') for line in d.readlines()]
d.close()

perms = ["".join(p) for p in permutations(letters)]

validWords = []

for p in perms:
    if p in dictionary: validWords.append(p)


for validWord in validWords:
    print validWord

I didn't time them precisely, but roughly it feels like the Python implementation is about 2x as fast as the Haskell one. Perhaps I should't have said the Haskell code was "incredibly slow" in comparison, but since Haskell is statically typed I guess I just thought that it should've been much faster, and not slower than Python at all.

like image 885
nilcit Avatar asked Sep 02 '16 01:09

nilcit


2 Answers

I'm kind of new to Haskell and tried making a scrabble solver.

You can substantially improve things by using a better algorithm.

Instead of testing every permutation of the input letters, if you sort them first you can make only one dictionary lookup and get all of the possible words (anagrams) which may be formed from them (using all of them).

Here is code which creates that dictionary as a Data.Map. There is a start-up cost to creating the Map, but after the first query subsequent lookups are very fast.

import Data.List
import qualified Data.Map.Strict as Map
import Control.Monad
import System.IO

main = do
  contents <- readFile "words"
  let pairs = [ (sort w, [w]) | w <- words contents ]
      dict = foldl' (\m (k,v) -> Map.insertWith (++) k v m) Map.empty pairs
      -- dict = foldr (\(k,v) m -> Map.insertWith (++) k v m) Map.empty pairs
  forever $ do
    putStr "Enter letters: " >> hFlush stdout
    letters <- getLine
    case Map.lookup (sort letters) dict of
      Nothing -> putStrLn "No words."
      Just ws -> putStrLn $ "Words: " ++ show ws

Map creation time for a word file of 236K words (2.5 MB) is about 4-5 seconds. Better performance is likely possible by using ByteStrings or Text instead of Strings.

Some good letter combinations to try:

steer rat tuna lapse groan neat

Note: Using GHC 7.10.2 I found this code performed the best without compiling with -O2.

like image 170
ErikR Avatar answered Nov 06 '22 19:11

ErikR


Checking if x is an element of dictWords is likely to be very slow. I would assume that your similar python implementation stores dictWords in a set or sorted vector (using binary search in the latter case)? Seems like you probably want to do the same here.

Using this word list and the code below, the Python version runs in about 30 seconds, and the Haskell version takes 1.5 minutes. So Haskell is slower (perhaps because it's using a linked list, which all things being equal, is slower to iterate over), but I wouldn't call it "incredibly slow" compared to Python. Switching to use a set in either version reduces the time to under 1 second.

from itertools import permutations
f = open('twl06.txt')
words = f.read().split()

print [''.join(p) for p in permutations('apricot') if ''.join(p) in words]

And here's the set-based Haskell code:

import Data.Set
import Data.List

main = do
    dict    <- readFile "twl06.txt"
    let letters = "apricot"
    let dictWords = Data.Set.fromList $ words dict
    let perms = permutations letters
    print [x | x <- perms, member x dictWords]
like image 6
happydave Avatar answered Nov 06 '22 19:11

happydave