Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell - Splitting a string by delimiter

Tags:

haskell

I am trying to write a program in Haskell to split a string by delimiter.

And I have studied different examples provided by other users. An example would the the code that is posted below.

split :: String -> [String]
split [] = [""]
split (c:cs)
   | c == ','  = "" : rest
   | otherwise = (c : head rest) : tail rest
 where
   rest = split cs

Sample Input: "1,2,3". Sample Output: ["1","2","3"].

I have been trying to modify the code so that the output would be something like ["1", "," , "2", "," , "3"] which includes the delimiter in the output as well , but I just cannot succeed.

For example, I changed the line:

   | c == ','  = "" : rest

into:

   | c == ','  = "," : rest

But the result becomes ["1,","2,","3"].

What is the problem and in which part I have had a misunderstanding?

like image 611
Peter.PP Avatar asked Oct 05 '17 08:10

Peter.PP


2 Answers

If you're trying to write this function "for real" instead of writing the character-by-character recursion for practice, I think a clearer method is to use the break function from Data.List. The following expression:

break (==',') str

breaks the string into a tuple (a,b) where the first part consists of the initial "comma-free" part, and the second part is either more string starting with the comma or else empty if there's no more string.

This makes the definition of split clear and straightforward:

split str = case break (==',') str of
                (a, ',':b) -> a : split b
                (a, "")    -> [a]

You can verify that this handles split "" (which returns [""]), so there's no need to treat that as a special case.

This version has the added benefit that the modification to include the delimiter is also easy to understand:

split2 str = case break (==',') str of
                (a, ',':b) -> a : "," : split2 b
                (a, "")    -> [a]

Note that I've written the patterns in these functions in more detail than is necessary to make it absolute clear what's going on, and this also means that Haskell does a duplicate check on each comma. For this reason, some people might prefer:

split str = case break (==',') str of
                (a, _:b) -> a : split b
                (a, _)   -> [a]

or, if they still wanted to document exactly what they were expecting in each case branch:

split str = case break (==',') str of
                (a, _comma:b) -> a : split b
                (a, _empty)   -> [a]
like image 192
K. A. Buhr Avatar answered Oct 02 '22 06:10

K. A. Buhr


Instead of altering code in the hope that it matches the expecations, it is usually better to understand the code fragment first.

split :: String -> [String]
split [] = [""]
split (c:cs) | c == ','  = "" : rest
             | otherwise = (c : head rest) : tail rest
    where rest = split cs

First of all we better analyze what split does. The first statement simply says "The split of an empty string, is a list with one element, the empty string". This seems reasonable. Now the second clause states: "In case the head of the string is a comma, we produce a list where the first element is an empty string, followed by splitting up the remainings of the string.". The last guard says "In case the first character of the string is not a comma, we prepend that character to the first item of the split of the remaining string, followed by the remaining elements of the split of the remaining string". Mind that split returns a list of strings, so the head rest is a string.

So if we want to add the delimiter to the output, then we need to add that as a separate string in the output of split. Where? In the first guard. We should not return "," : rest, since the head is - by recursion - prepended, but as a separate string. So the result is:

split :: String -> [String]
split [] = [""]
split (c:cs) | c == ','  = "" : "," : rest
             | otherwise = (c : head rest) : tail rest
    where rest = split cs
like image 25
Willem Van Onsem Avatar answered Oct 02 '22 06:10

Willem Van Onsem