I am trying to write a program in Haskell to split a string by delimiter.
And I have studied different examples provided by other users. An example would the the code that is posted below.
split :: String -> [String]
split [] = [""]
split (c:cs)
| c == ',' = "" : rest
| otherwise = (c : head rest) : tail rest
where
rest = split cs
Sample Input: "1,2,3"
.
Sample Output: ["1","2","3"]
.
I have been trying to modify the code so that the output would be something like ["1", "," , "2", "," , "3"]
which includes the delimiter in the output as well , but I just cannot succeed.
For example, I changed the line:
| c == ',' = "" : rest
into:
| c == ',' = "," : rest
But the result becomes ["1,","2,","3"]
.
What is the problem and in which part I have had a misunderstanding?
If you're trying to write this function "for real" instead of writing the character-by-character recursion for practice, I think a clearer method is to use the break
function from Data.List
. The following expression:
break (==',') str
breaks the string into a tuple (a,b)
where the first part consists of the initial "comma-free" part, and the second part is either more string starting with the comma or else empty if there's no more string.
This makes the definition of split
clear and straightforward:
split str = case break (==',') str of
(a, ',':b) -> a : split b
(a, "") -> [a]
You can verify that this handles split ""
(which returns [""]
), so there's no need to treat that as a special case.
This version has the added benefit that the modification to include the delimiter is also easy to understand:
split2 str = case break (==',') str of
(a, ',':b) -> a : "," : split2 b
(a, "") -> [a]
Note that I've written the patterns in these functions in more detail than is necessary to make it absolute clear what's going on, and this also means that Haskell does a duplicate check on each comma. For this reason, some people might prefer:
split str = case break (==',') str of
(a, _:b) -> a : split b
(a, _) -> [a]
or, if they still wanted to document exactly what they were expecting in each case branch:
split str = case break (==',') str of
(a, _comma:b) -> a : split b
(a, _empty) -> [a]
Instead of altering code in the hope that it matches the expecations, it is usually better to understand the code fragment first.
split :: String -> [String]
split [] = [""]
split (c:cs) | c == ',' = "" : rest
| otherwise = (c : head rest) : tail rest
where rest = split cs
First of all we better analyze what split
does. The first statement simply says "The split of an empty string, is a list with one element, the empty string". This seems reasonable. Now the second clause states: "In case the head of the string is a comma, we produce a list where the first element is an empty string, followed by splitting up the remainings of the string.". The last guard says "In case the first character of the string is not a comma, we prepend that character to the first item of the split of the remaining string, followed by the remaining elements of the split of the remaining string". Mind that split
returns a list of strings, so the head rest
is a string.
So if we want to add the delimiter to the output, then we need to add that as a separate string in the output of split
. Where? In the first guard. We should not return "," : rest
, since the head is - by recursion - prepended, but as a separate string. So the result is:
split :: String -> [String]
split [] = [""]
split (c:cs) | c == ',' = "" : "," : rest
| otherwise = (c : head rest) : tail rest
where rest = split cs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With