My goal was to write a function to parse string of nested parentheses into a corresponding list:
parseParens "()" --> []
parseParens "(())" --> [[]]
parseParens "((()()))" --> [[[],[]]]
First off I discovered that I can't specify easily define a type of the return value. I could do something like:
parseParens :: String -> [[[[t]]]]
But how do I say that it's infinitely nested? I guess Haskell doesn't allow that.
I came up with my own data type:
data InfiniteList = EmptyList | Cons InfiniteList InfiniteList deriving (Show)
And a parser function that uses this:
parseParens :: String -> InfiniteList
parseParens ('(':xs) =
if remainder == ""
then result
else error "Unbalanced parenthesis"
where (result, remainder) = parseToClose EmptyList xs
parseParens _ = error "Unbalanced parenthesis"
parseToClose :: InfiniteList -> String -> (InfiniteList, String)
parseToClose acc "" = error "Unbalanced parenthesis!"
parseToClose acc (')':xs) = (acc, xs)
parseToClose acc ('(':xs) = parseToClose (concatInfLists acc (Cons result EmptyList)) remainder
where (result, remainder) = parseToClose EmptyList xs
concatInfLists :: InfiniteList -> InfiniteList -> InfiniteList
concatInfLists EmptyList ys = ys
concatInfLists (Cons x xs) ys = Cons x (concatInfLists xs ys)
Working like so:
parseParens "()" --> EmptyList
parseParens "(())" --> Cons EmptyList EmptyList
parseParens "((()()))" --> Cons (Cons EmptyList (Cons EmptyList EmptyList)) EmptyList
There surely must be a better way to do this. Perhaps there's even a way to use the built-in List data type for this?
Edit: Fixed my mischaracterization of Benjamin's answer.
While the answer in @Benjamin Hodgson's comment:
data Nested a = Flat a | Nested (Nested [a]) deriving (Show)
gives a good way to represent a homogeneous list of arbitrary nesting depth (i.e., sort of like a sum type of [a]
plus [[a]]
plus [[[a]]]
plus all the rest), it seems like an unusual representation for your problem, particularly in a case like:
parseParens "(()(()))"
where the nesting depth of the "child nodes" differs. This would be represented as:
Nested (Nested (Nested (Flat [[],[[]]]))) :: Nested a
so it literally allows you to represent the result of the parse as the desired list, given enough Nested
constructors, but it has some odd properties. For example, the innermost empty lists actually have different types: the first is of type [[a]]
while the second is of type [a]
.
As a alternative approach, I think the data type you actually want is probably just:
data Nested = N [Nested] deriving (Show)
where each node N
is a (possibly empty) list of nodes. Then, you'll get:
> parseParens "()"
N []
> parseParens "(())"
N [N []]
> parseParens "((()()))"
N [N [N [],N []]]
> parseParens "(()(()))"
N [N [],N [N []]]
If you just ignore the N
constructors in these results, the first three of these match your "corresponding list" test cases from the beginning of your question.
As a side note: the Nested
data type above is actually a "rose tree" containing no data, equivalent to Tree ()
using the Tree
data type from Data.Tree
in the containers
package.
Finally, I can't emphasize enough how helpful it is to learn and use a monadic parsing library, even for simple parsing jobs. Using the parsec
library, for example, you can write a parser for your grammar in one line:
nested = N <$> between (char '(') (char ')') (many nested)
My full code for parseParens
is:
import Data.Tree
import Text.Parsec
import Text.Parsec.String
data Nested = N [Nested] deriving (Show)
nested :: Parser Nested
nested = N <$> between (char '(') (char ')') (many nested)
parseParens :: String -> Nested
parseParens str =
let Right result = parse (nested <* eof) "" str
in result
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With