Sorry I don't quite get FP yet, I want to split a sequence of lines into a sequence of sequences of lines, assuming an empty line as paragraph division, I could do it in python like this:
def get_paraghraps(lines):
paragraphs = []
paragraph = []
for line in lines:
if line == "": # I know it could also be "if line:"
paragraphs.append(paragraph)
paragraph = []
else:
paragraph.append(line)
return paragraphs
How would you go about doing it in Erlang or Haskell?
I'm only a beginning Haskell programmer (and the little Haskell I learnt was 5 years ago), but for a start, I'd write the natural translation of your function, with the accumulator ("the current paragraph") being passed around (I've added types, just for clarity):
type Line = String
type Para = [Line]
-- Takes a list of lines, and returns a list of paragraphs
paragraphs :: [Line] -> [Para]
paragraphs ls = paragraphs2 ls []
-- Helper function: takes a list of lines, and the "current paragraph"
paragraphs2 :: [Line] -> Para -> [Para]
paragraphs2 [] para = [para]
paragraphs2 ("":ls) para = para : (paragraphs2 ls [])
paragraphs2 (l:ls) para = paragraphs2 ls (para++[l])
This works:
*Main> paragraphs ["Line 1", "Line 2", "", "Line 3", "Line 4"]
[["Line 1","Line 2"],["Line 3","Line 4"]]
So that's a solution. But then, Haskell experience suggests that there are almost always library functions for doing things like this :) One related function is called groupBy, and it almost works:
paragraphs3 :: [Line] -> [Para]
paragraphs3 ls = groupBy (\x y -> y /= "") ls
*Main> paragraphs3 ["Line 1", "Line 2", "", "Line 3", "Line 4"]
[["Line 1","Line 2"],["","Line 3","Line 4"]]
Oops. What we really need is a "splitBy", and it's not in the libraries, but we can filter out the bad ones ourselves:
paragraphs4 :: [Line] -> [Para]
paragraphs4 ls = map (filter (/= "")) (groupBy (\x y -> y /= "") ls)
or, if you want to be cool, you can get rid of the argument and do it the pointless way:
paragraphs5 = map (filter (/= "")) . groupBy (\x y -> y /= "")
I'm sure there is an even shorter way. :-)
Edit: ephemient points out that (not . null)
is cleaner than (/= "")
. So we can write
paragraphs = map (filter $ not . null) . groupBy (const $ not . null)
The repeated (not . null)
is a strong indication that we really should abstract this out into a function, and this is what the Data.List.Split module does, as pointed out in the answer below.
I'm also trying to learn Haskell. A solution for this question could be:
paragraphs :: [String] -> [[String]]
paragraphs [] = []
paragraphs lines = p : (paragraphs rest)
where (p, rest) = span (/= "") (dropWhile (== "") lines)
where I'm using the functions from Data.List. The ones I'm using are already available from the Prelude, but you can find their documentation in the link.
The idea is to find the first paragraph using span (/= "")
. This will return the paragraph, and the lines following. We then recurse on the smaller list of lines which I call rest
.
Before splitting out the first paragraph, we drop any empty lines using dropWhile (== "")
. This is important to eat the empty line(s) separating the paragraphs. My first attempt was this:
paragraphs :: [String] -> [[String]]
paragraphs [] = []
paragraphs lines = p : (paragraphs $ tail rest)
where (p, rest) = span (/= "") lines
but this fails when we reach the final paragraph since rest
is then the empty string:
*Main> paragraphs ["foo", "bar", "", "hehe", "", "bla", "bla"] [["foo","bar"],["hehe"],["bla","bla"]*** Exception: Prelude.tail: empty list
Dropping empty lines solves this, and it also makes the code treat any number of empty lines as a paragraph separator, which is what I would expect as a user.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With