I have two Haskell functions, both of which seem very similar to me. But the first one FAILS against infinite lists, and the second one SUCCEEDS against infinite lists. I have been trying for hours to nail down exactly why that is, but to no avail.
Both snippets are a re-implementation of the "words" function in Prelude. Both work fine against finite lists.
Here's the version that does NOT handle infinite lists:
myWords_FailsOnInfiniteList :: String -> [String]
myWords_FailsOnInfiniteList string = foldr step [] (dropWhile charIsSpace string)
where
step space ([]:xs) | charIsSpace space = []:xs
step space (x:xs) | charIsSpace space = []:x:xs
step space [] | charIsSpace space = []
step char (x:xs) = (char : x) : xs
step char [] = [[char]]
Here's the version that DOES handle infinite lists:
myWords_anotherReader :: String -> [String]
myWords_anotherReader xs = foldr step [""] xs
where
step x result | not . charIsSpace $ x = [x:(head result)]++tail result
| otherwise = []:result
Note: "charIsSpace" is merely a renaming of Char.isSpace.
The following interpreter session illustrates that the first one fails against an infinite list while the second one succeeds.
*Main> take 5 (myWords_FailsOnInfiniteList (cycle "why "))
*** Exception: stack overflow
*Main> take 5 (myWords_anotherReader (cycle "why "))
["why","why","why","why","why"]
EDIT: Thanks to the responses below, I believe I understand now. Here are my conclusions and the revised code:
Conclusions:
So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.
myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
where
step space acc | charIsSpace space = "":acc
step char (x:xs) = (char:x):xs
step _ [] = error "this should be impossible"
This correctly works against infinite lists. Note there's no head, tail or (++) operator in sight.
Now, for an important caveat: When I first wrote the corrected code, I did not have the 3rd equation, which matches against "step _ []". As a result, I received the warning about non-exhaustive pattern matches. Obviously, it is a good idea to avoid that warning.
But I thought I was going to have a problem. I already mentioned above that it is not OK to pattern match the second arg against []. But I would have to do so in order to get rid of the warning.
However, when I added the "step _ []" equation, everything was fine! There was still no problem with infinite lists!. Why?
Because the 3rd equation in the corrected code IS NEVER REACHED!
In fact, consider the following BROKEN version. It is EXACTLY the SAME as the correct code, except that I have moved the pattern for empty list up above the other patterns:
myWords_brokenAgain :: String -> [String]
myWords_brokenAgain string = foldr step [""] (dropWhile charIsSpace string)
where
step _ [] = error "this should be impossible"
step space acc | charIsSpace space = "":acc
step char (x:xs) = (char:x):xs
We're back to stack overflow, because the first thing that happens when step is called is that the interpreter checks to see if equation number one is a match. To do so, it must see if the second arg is []. To do that, it must evaluate the second arg.
Moving the equation down BELOW the other equations ensures that the 3rd equation is never attempted, because either the first or the second pattern always matches. The 3rd equation is merely there to dispense with the non-exhaustive pattern warning.
This has been a great learning experience. Thanks to everyone for your help.
As you all know, lists in Haskell can be infinite. This capability is a natural consequence of Haskell's lazy evaluation. An infinite list in Haskell is no problem since Haskell will lazily generate only as much of the list as is needed, when it is needed.
In Haskell, lists are a homogenous data structure. It stores several elements of the same type. That means that we can have a list of integers or a list of characters but we can't have a list that has a few integers and then a few characters.
No. However, you can have functions that return a trivial value.
Termination of Haskell Functions Intuitively a specific function evaluation (where the value of every argument is supplied) is terminating if the Haskell evaluation strategy needs finitely many steps to compute the result completely.
Others have pointed out the problem, which is that step always evaluates its second argument before producing any output at all, yet its second argument will ultimately depend on the result of another invocation of step when the foldr is applied to an infinite list.
It doesn't have to be written this way, but your second version is kind of ugly because it relies on the initial argument to step having a particular format and it's quite hard to see that the head/tail will never go wrong. (I'm not even 100% certain that they won't!)
What you should do is restructure the first version so it produces output without depending on the input list in at least some situations. In particular we can see that when the character is not a space, there's always at least one element in the output list. So delay the pattern-matching on the second argument until after producing that first element. The case where the character is a space will still be dependent on the list, but that's fine because the only way that case can infinitely recurse is if you pass in an infinite list of spaces, in which case not producing any output and going into a loop is the expected behaviour for words (what else could it do?)
Try expanding the expression by hand:
take 5 (myWords_FailsOnInfiniteList (cycle "why "))
take 5 (foldr step [] (dropWhile charIsSpace (cycle "why ")))
take 5 (foldr step [] (dropWhile charIsSpace ("why " ++ cycle "why ")))
take 5 (foldr step [] ("why " ++ cycle "why "))
take 5 (step 'w' (foldr step [] ("hy " ++ cycle "why ")))
take 5 (step 'w' (step 'h' (foldr step [] ("y " ++ cycle "why "))))
What's the next expansion? You should see that in order to pattern match for step
, you need to know whether it's the empty list or not. In order to find that out, you have to evaluate it, at least a little bit. But that second term happens to be a foldr
reduction by the very function you're pattern matching for. In other words, the step function cannot look at its arguments without calling itself, and so you have an infinite recursion.
Contrast that with an expansion of your second function:
myWords_anotherReader (cycle "why ")
foldr step [""] (cycle "why ")
foldr step [""] ("why " ++ cycle "why ")
step 'w' (foldr step [""] ("hy " ++ cycle "why ")
let result = foldr step [""] ("hy " ++ cycle "why ") in
['w':(head result)] ++ tail result
let result = step 'h' (foldr step [""] ("y " ++ cycle "why ") in
['w':(head result)] ++ tail result
You can probably see that this expansion will continue until a space is reached. Once a space is reached, "head result" will obtain a value, and you will have produced the first element of the answer.
I suspect that this second function will overflow for infinite strings that don't contain any spaces. Can you see why?
The second version does not actually evaluate result
until after it has started producing part of its own answer. The first version evaluates result
immediately by pattern matching on it.
The key with these infinite lists is that you have to produce something before you start demanding list elements so that the output can always "stay ahead" of the input.
(I feel like this explanation is not very clear, but it's the best I can do.)
The library function foldr
has this implementation (or similar):
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f k (x:xs) = f x (foldr f k xs)
foldr _ k _ = k
The result of myWords_FailsOnInfiniteList
depends on the result of foldr
which depends on the result of step
which depends on the result of the inner foldr
which depends on ... and so on an infinite list, myWords_FailsOnInfiniteList
will use up an infinite amount of space and time before producing its first word.
The step
function in myWords_anotherReader
does not require the result of the inner foldr
until after it has produced the first letter of the first word. Unfortunately, as Apocalisp says, it uses O(length of first word) space before it produces the next word, because as the first word is being produced, the tail thunk keeps growing tail ([...] ++ tail ([...] ++ tail (...)))
.
In contrast, compare to
myWords :: String -> [String]
myWords = myWords' . dropWhile isSpace where
myWords' [] = []
myWords' string =
let (part1, part2) = break isSpace string
in part1 : myWords part2
using library functions which may be defined as
break :: (a -> Bool) -> [a] -> ([a], [a])
break p = span $ not . p
span :: (a -> Bool) -> [a] -> ([a], [a])
span p xs = (takeWhile p xs, dropWhile p xs)
takeWhile :: (a -> Bool) -> [a] -> [a]
takeWhile p (x:xs) | p x = x : takeWhile p xs
takeWhile _ _ = []
dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p (x:xs) | p x = dropWhile p xs
dropWhile _ xs = xs
Notice that producing the intermediate results is never held up by future computation, and only O(1) space is needed as each element of the result is made available for consumption.
So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.
myWords :: String -> [String] myWords string = foldr step [""] (dropWhile charIsSpace string) where step space acc | charIsSpace space = "":acc step char (x:xs) = (char:x):xs step _ [] = error "this should be impossible"
(Aside: You may not care, but the words "" == []
from the library, but your myWords "" = [""]
. Similar issue with trailing spaces.)
Looks much-improved over myWords_anotherReader
, and is pretty good for a foldr
-based solution.
\n -> tail $ myWords $ replicate n 'a' ++ " b"
It's not possible to do better than O(n) time, but both myWords_anotherReader
and myWords
take O(n) space here. This may be inevitable given the use of foldr
.
Worse,
\n -> head $ head $ myWords $ replicate n 'a' ++ " b"
myWords_anotherReader
was O(1) but the new myWords
is O(n), because pattern matching (x:xs)
requires the further result.
You can work around this with
myWords :: String -> [String]
myWords = foldr step [""] . dropWhile isSpace
where
step space acc | isSpace space = "":acc
step char ~(x:xs) = (char:x):xs
The ~
introduces an "irrefutable pattern". Irrefutable patterns never fail and do not force immediate evaluation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With