Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell breaking up words by first space

Tags:

string

haskell

Note this is not the same as using the words function.

I would like to convert from this:

"The quick brown fox jumped over the lazy dogs."

into this:

["The"," quick"," brown"," fox"," jumped"," over"," the"," lazy"," dogs."]

Note how the breaks are on the first space after each word.

The best I could come up with is this:

parts "" = []
parts s  = if null a then (c ++ e):parts f else a:parts b
    where
    (a, b) = break isSpace s
    (c, d) = span isSpace s
    (e, f) = break isSpace d

It just looks a little inelegant. Can anyone think of a better way to express this?

like image 301
Snoqual Avatar asked Aug 16 '11 03:08

Snoqual


3 Answers

edit -- Sorry I didn't read the question. Hopefully this new answer does what you want.

> List.groupBy (\x y -> y /= ' ') "The quick brown fox jumped over the lazy dogs."
["The"," quick"," brown"," fox"," jumped"," over"," the"," lazy"," dogs."]

The library function groupBy takes a predicate function that tells you whether you add the next element, y to the previous list, which starts with x, or start a new list.

In this case, we don't care what the current list started with, we only want to start a new list (i.e. make the predicate evaluate to false) when the next element, y, is a space.

edit

n.m. points out that the handling of multiple spaces is not correct. In which case you can switch to Data.List.HT, which has the semantics you'd want.

> import Data.List.HT as HT
> HT.groupBy (\x y -> y /= ' ' || x == ' ') "a  b c d"
["a","  b"," c"," d"]

the different semantics that makes this work is that the x is the last element in the previous list (that you might add y to, or create a new list).

like image 120
gatoatigrado Avatar answered Nov 11 '22 23:11

gatoatigrado


If you're doing lots of slightly different types of splits, have a look at the split package. The package lets you define this split as split (onSublist [" "]).

like image 35
John L Avatar answered Nov 11 '22 22:11

John L


words2 xs = head w : (map (' ':) $ tail w)
  where w = words xs

And here's with arrows and applicative: (not recommended for practical use)

words3 = words >>> (:) <$> head <*> (map (' ':) . tail)

EDIT: My first solution is wrong, because it eats additional spaces. Here's the correct one:

words4 = foldr (\x acc -> if x == ' ' || head acc == "" || (head $ head acc) /= ' '  
                             then (x : head acc) : tail acc
                             else [x] : acc) [""]
like image 43
Vagif Verdi Avatar answered Nov 11 '22 21:11

Vagif Verdi