I'm trying to optimize the execution speed of my program, and I ran into some interesting results that I'm hoping someone can answer. It seems that making small changes in one of my list comprehensions drastically changes the execution speed, but I don't know why.
Here's my program as it is right now.
import Data.Ord
import Control.Monad
import Data.Array
import Data.Ix
import qualified Data.Map as M
import qualified Data.Set as S
import Data.List (minimumBy, foldl')
arrayMatrix lists = let rlen = length lists
clen = length $ head lists
r = ((1,1), (rlen, clen))
in array r . zip (range r) $ concat lists
a_star start goal h m = search S.empty (S.singleton start)
(M.singleton start (m ! start))
$ M.singleton start (m ! start + h ! start)
where neighbors (r,c) = filter (inRange $ bounds m) [ (r-1,c), (r,c+1), (r+1,c) , (r,c-1)]
search closed open gs fs
| S.null open = 0
| current == goal = gs M.! goal
| otherwise = let open' = S.delete current open
closed' = S.insert current closed
neighbs = [(n, ts) | n <- neighbors current, S.notMember n closed
, let ts = gs M.! current + m ! n ]
actionable = filter (\(n,ts) -> S.notMember n open' || ts < (gs M.! n)) neighbs
(op',gs',fs') = foldl' (\(o,ng,nf) (n,ts) -> (S.insert n o, M.insert n ts ng, M.insert n (ts + h ! n) nf)) (open',gs,fs) actionable
in search closed' op' gs' fs'
where current = minimumBy (comparing (fs M.!)) $ S.toList open
main = do
matrix <- liftM (arrayMatrix . map (read . ('[':) . (++"]")) . lines)
$ readFile "matrix.txt"
let bds = bounds matrix
ulim = snd bds
heuristic = let m = minimum $ elems matrix
in listArray bds . map (\(r,c) -> (uncurry (+) ulim)-r-c) $ range bds
print $ a_star (1,1) ulim heuristic matrix
Right now the program runs on my computer ~350ms (compiled with GHC 7.8.2 -O2) with the matrix.txt supplied by Project Euler.
If I change neighbs from
neighbs = [(n, ts) | n <- neighbors current, S.notMember n closed
, let ts = gs M.! current + m ! n ]
to
neighbs = [(n, gs M.! current + m ! n) | n <- neighbors current, S.notMember n closed]
the execution time increases to over 1sec.
Other minor changes like moving the filter on the next line into the list comprehension yields the same result: ~1sec.
Can anyone explain why this happens?
EDIT: It seems this doesn't happen on earlier versions of GHC. I tried GHC 7.6.3 and each of these performed about the same.
I've included the dumps from running ghc -O2 -ddump-simpl -dsuppress-all
as suggested by cdk. I don't really know what I'm looking at, so if anyone is able to interpret, that would be a big help, thanks.
Link to both dumps
EDIT2 (Response to Priyatham): I don't think that's the case. I changed
neighbs = [(n, ts) | n <- neighbors current, S.notMember n closed
, let ts = gs M.! current + m ! n ]
actionable = filter ((n,ts) -> S.notMember n open' || ts < (gs M.! n)) neighbs
to
neighbs = [(n, gs M.! current + m ! n) | n <- neighbors current, S.notMember n closed ]
actionable = filter ((n,!ts) -> S.notMember n open' || ts < (gs M.! n)) neighbs
using BangPatterns, and that still runs at a little over a second. In fact, modifying neigbs from
neighbs = [(n, ts) | n <- neighbors current, S.notMember n closed
, let ts = gs M.! current + m ! n ]
to
neighbs = [(n, ts) | n <- neighbors current, S.notMember n closed
, let !ts = gs M.! current + m ! n ] -- Added bang before ts
increases the runtime to over 1sec as well.
Here's one guess about what happened with let ts =
vs. let !ts =
. I
got it from looking at the output of -ddump-stranal
(which dumps
strictness analysis annotations), and reading Demand analyser in
GHC.
The difference between let !ts =
and let ts =
is that if ts
is
bottom (i.e., undefined
), then n
will not be evaluated at all
because ts
will be evaluated first and the evaluation will stop. It
appears that the difference between two programs is that the pair of
integers n
is strict and unboxed in one version, but not in the
other (see the output of -ddump-stranal
and -ddump-simpl
; the link
above describes the output).
How can !ts
or not !ts
affect strictness of n
? I think that if
ts
is bottom then the program must fail before evaluating n
or
any of its elements (I am not sure if it's n :: (Int, Int)
itself or
its elements). So ghc seems to do the right thing to keep n
non-strict when ts
is required to be strict, because evaluating n
first and possibly failing in a different place might be a mistake.
Next, how do you force !ts
to have no impact on n
? Note that ts
cannot be bottom without n
being bottom if either gs
, current
,
or m
are known to not be bottom (these are all the elements of the expression except for n
) and have already been evaluated (I think M.!
and !
might
never be bottom without evaluating their arguments first). So we need to impose the condition "ts
is bottom implies
n
is bottom and has already been evaluated", so that ghc knows that it is safe to evaluate n
first.
My solution: add bang patterns to current
, gs
and m
. With my ghc
7.8.2, this seems to solve the problem. It also appears that only current
needs to be forced.
I'm not too sure about the original question about moving the expression of
ts
into the tuple, but the same solution seems to work.
P.S. Note that
filter (\x -> x > 5) [x | x <- [1..10]] == [x | x <- [1..10], x > 5]
so in your lists neighbs
and
actionable
it would be cleaner to bring the filter predicate into
the list comprehension itself like so:
[(n, ts)
| n <- neighbors current
, S.notMember n closed
, let ts = gs M.! current + m ! n
, S.notMember n open' || ts < (gs M.! n)
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With