How does GHC know how to cache one function but not the others?

Question

I'm reading Learn You a Haskell (loving it so far) and it teaches how to implement elem in terms of foldl, using a lambda. The lambda solution seemed a bit ugly to me so I tried to think of alternative implementations (all using foldl):

import qualified Data.Set as Set
import qualified Data.List as List

-- LYAH implementation
elem1 :: (Eq a) => a -> [a] -> Bool
y `elem1` ys = 
    foldl (\acc x -> if x == y then True else acc) False ys

-- When I thought about stripping duplicates from a list
-- the first thing that came to my mind was the mathematical set
elem2 :: (Eq a) => a -> [a] -> Bool
y `elem2` ys = 
    head $ Set.toList $ Set.fromList $ filter (==True) $ map (==y) ys

-- Then I discovered `nub` which seems to be highly optimized: 
elem3 :: (Eq a) => a -> [a] -> Bool
y `elem3` ys = 
    head $ List.nub $ filter (==True) $ map (==y) ys

I loaded these functions in GHCi and did :set +s and then evaluated a small benchmark:

3 `elem1` [1..1000000] --  => (0.24 secs, 160,075,192 bytes)
3 `elem2` [1..1000000] --  => (0.51 secs, 168,078,424 bytes)
3 `elem3` [1..1000000] --  => (0.01 secs, 77,272 bytes)

I then tried to do the same on a (much) bigger list:

3 `elem3` [1..10000000000000000000000000000000000000000000000000000000000000000000000000]

elem1 and elem2 took a very long time, while elem3 was instantaneous (almost identical to the first benchmark).
I think this is because GHC knows that 3 is a member of [1..1000000], and the big number I used in the second benchmark is bigger than 1000000, hence 3 is also a member of [1..bigNumber] and GHC doesn't have to compute the expression at all.
But how is it able to automatically cache (or memoize, a term that Land of Lisp taught me) elem3 but not the two other ones?

Willem Van Onsem · Accepted Answer

Short answer: this has nothing to do with caching, but the fact that you force Haskell in the first two implementations, to iterate over all elements.

No, this is because foldl works left to right, but it will thus keep iterating over the list until the list is exhausted.

Therefore you better use foldr. Here from the moment it finds a 3 it in the list, it will cut off the search.

This is because foldris defined as:

foldr f z [x1, x2, x3] = f x1 (f x2 (f x3 z))

whereas foldl is implemented as:

foldl f z [x1, x2, x3] = f (f (f (f z) x1) x2) x3

Note that the outer f thus binds with x3, so that means foldl first so if due to laziness you do not evaluate the first operand, you still need to iterate to the end of the list.

If we implement the foldl and foldr version, we get:

y `elem1l` ys = foldl (\acc x -> if x == y then True else acc) False ys
y `elem1r` ys = foldr (\x acc -> if x == y then True else acc) False ys

We then get:

Prelude> 3 `elem1l` [1..1000000]
True
(0.25 secs, 112,067,000 bytes)
Prelude> 3 `elem1r` [1..1000000]
True
(0.03 secs, 68,128 bytes)

Stripping the duplicates from the list will not imrpove the efficiency. What here improves the efficiency is that you use map. map works left-to-right. Note furthermore that nub works lazy, so nub is here a no op, since you are only interested in the head, so Haskell does not need to perform memberchecks on the already seen elements.

The performance is almost identical:

Prelude List> 3 `elem3` [1..1000000]
True
(0.03 secs, 68,296 bytes)

In case you work with a Set however, you do not perform uniqueness lazily: you first fetch all the elements into the list, so again, you will iterate over all the elements, and not cut of the search after the first hit.

How does GHC know how to cache one function but not the others?

Tags:

benchmarking

haskell

memoization

ohmree

1 Answers

Willem Van Onsem

Recent Activity

Donate For Us

How does GHC know how to cache one function but not the others?

Tags:

benchmarking

haskell

memoization

ohmree

1 Answers

Willem Van Onsem

Related questions

Recent Activity

Donate For Us