Slower execution when using an infinite list

Question

I'm beginning to try and get my head round haskell performance, and what makes things fast and slow, and I'm a little confused by this.

I have two implementations of a function that generates a list of primes up to a certain value. The first is straight off the Haskell wiki:

primesTo :: (Ord a, Num a, Enum a) => a -> [a]
primesTo m = eratos [2..m]  where
   eratos []     = []
   eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])

The second is the same, but using an infinite list internally:

primes2 :: (Ord a, Num a, Enum a) => a -> [a]
primes2 m = takeWhile (<= m) (eratos [2..])  where
   eratos []     = []
   eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..])

In both cases, the minus function is:

minus :: (Ord a) => [a] -> [a] -> [a]
minus (x:xs) (y:ys) = case (compare x y) of 
           LT -> x : minus  xs  (y:ys)
           EQ ->     minus  xs     ys 
           GT ->     minus (x:xs)  ys
minus  xs _ = xs

The latter implementation is significantly (~100x) slower than the former, and I don't get why. I would have thought that haskell's lazy evalutation would make them fairly equivalent under the hood.

This is obviously a reduced test case for the purposes of the question - in real life the optimisation would be no problem (although I don't understand why it is needed), but to me a function that just generates an infinite list of primes is more generically useful than a finite list, but appears slower to work with.

J. Abrahamson · Accepted Answer

Looks like to me that there's a big difference between

(xs `minus` [p*p, p*p+p..m])  -- primesTo
(xs `minus` [p*p, p*p+p..])   -- primes2

The function minus steps through lists pairwise and terminates when one list reaches the end. In the first minus expression above, this occurs in no more than (m-p*p)/p steps when the latter list is exhausted. In the second one, it will always take steps on the order of length xs.

So your infinite lists have disabled at least one meaningful optimization.

hugomg · Answer

One difference is that in the second case you need to generate one extra prime. You need to generate the first prime greater than m before takeWhile knows its time to stop.

Additionally, the [..m] bounds on both the list to filter and the lists of multiples help reduce the number of calculations. Whenever one of these lists gets empty minus immediately returns via its secons clause while in the infinite case the minus gets stuck in the first case. You can explore this a bit better if you also test the cases where only one of the lists is infinite:

--this is also slow
primes3 :: (Ord a, Num a, Enum a) => a -> [a]
primes3 m = takeWhile (<= m) (eratos [2..m])  where
   eratos []     = []
   eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..])

--this fast
primes4 :: (Ord a, Num a, Enum a) => a -> [a]
primes4 m = takeWhile (<= m) (eratos [2..])  where
   eratos []     = []
   eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])

Slower execution when using an infinite list

Tags:

performance

haskell

WillW

2 Answers

J. Abrahamson

hugomg

Recent Activity

Donate For Us

Slower execution when using an infinite list

Tags:

performance

haskell

WillW

2 Answers

J. Abrahamson

hugomg

Related questions

Recent Activity

Donate For Us