Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is implementing the words function possible without a postprocessing step after folding?

Real World Haskell, chapter 4, page 98 of the printed version asks if words can be implemented using folds, and this is my question too:

Is it possible? If not, why? If it is, how?

I came up with the following, which is based on the idea that each non-space should be prepended to the last word in the output list (this happens in the otherwise guard), and that a space should trigger the appending of an emtpy word to the output list if there is not one already (this is handled in the if-then-else).

myWords :: String -> [String]
myWords = foldr step [[]]
  where
    step x yss@(y:ys)
      | x == ' ' = if y == "" then yss else "":yss
      | otherwise = (x:y):ys

Clearly this solution is wrong, since leading spaces in the input string result in one leading empty string in the output list of strings.

At the link above, I've looked into several of the proposed solutions for other readers, and many of them work similarly to my solution, but they generally "post-process" the output of the fold, for instance by tailing it if there is an empty leading string.

Other approaches use tuples (actually just pairs), so that the fold deals with the pair and can well handle the leading/trailing spaces.

In all these approaches, foldr (or another fold, fwiw) is not the function that provides the final output out of the box; there's always something else with has to adjust the output somehow.

Therefore I go back to the initial question and ask if it is actually possible to implement words (in a way that it correctly handles trailing/leading/repeated spaces) using folds. By using folds I mean that the folding function has to be the outermost function:

myWords :: String -> [String]
myWords input = foldr step seed input
like image 541
Enlico Avatar asked Apr 29 '20 07:04

Enlico


3 Answers

If I understand correctly, your requirements include

(1) words "a b c" == words " a b c" == ["a", "b", "c"]
(2) words "xa b c" == ["xa", "b", "c"] /= ["x", "a", "b", "c"] == words "x a b c"

This implies that we can not have

words = foldr step base

for any step and base.

Indeed, if we had that, then

words "xa b c"
= def words and foldr
step 'x' (words "a b c")
= (1)
step 'x' (words " a b c")
= def words and foldr
words "x a b c"

and this contradicts (2).

You definitely need some post-processing after the foldr.

like image 144
chi Avatar answered Oct 13 '22 07:10

chi


@chi has a wonderful argument that you cannot implement words using "a" fold, but you did say using folds.

words = filterNull . words1
    where
    filterNull = foldr (\xs -> if null xs then id else (xs:)) []
    words1 = foldr (\c -> if c == ' ' then ([]:) else consHead c) []
    consHead c []       = [[c]]
    consHead c (xs:xss) = (c:xs):xss

Both the outermost and innermost function are folds. ;-)

like image 45
luqui Avatar answered Oct 13 '22 07:10

luqui


Yes. Eventhough it's a little tricky you may still do this job properly by using a single foldr and nothing else if you dwell into CPS (Continuation Passing Style). I had shown a special kind of chunksOf function previously.

In this kinds of folds our accumulator, hence the result of the fold is a function and we have to apply it to an identity kind of input so that we have the final result. So this may count as a final processing stage or not since we are using a single fold here and the type of it includes the function. Open to debate :)

ws :: String -> [String]
ws str = foldr go sf str $ ""
         where
         sf :: String -> [String]
         sf s = if s == " " then [""] else [s]
         go :: Char -> (String -> [String]) -> (String -> [String])
         go c f = \pc -> let (s:ss) = f [c]
                         in case pc of
                            ""        -> dropWhile (== "") (s:ss)
                            otherwise -> case (pc == " ", s == "") of
                                         (True, False)  -> "":s:ss
                                         (True, True)   -> s:ss
                                         otherwise      -> (pc++s):ss

λ> ws "   a  b    c   "
["a","b","c"]

sf : The initial function value to start with.

go : The iterator function

We are actually not fully utilizing the power of the CPS here since we have both the previous character pc and the currect character c at hand in every turn. It was very useful in the chunksOf function mentioned above while chunking a [Int] into [[Int]] every time an ascending sequence of elements were broken.

like image 3
Redu Avatar answered Oct 13 '22 09:10

Redu