Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing String of parenthesis to nested List in Haskell

My goal was to write a function to parse string of nested parentheses into a corresponding list:

parseParens "()" --> []
parseParens "(())" --> [[]]
parseParens "((()()))" --> [[[],[]]]

First off I discovered that I can't specify easily define a type of the return value. I could do something like:

parseParens :: String -> [[[[t]]]]

But how do I say that it's infinitely nested? I guess Haskell doesn't allow that.

My solution

I came up with my own data type:

data InfiniteList = EmptyList | Cons InfiniteList InfiniteList deriving (Show)

And a parser function that uses this:

parseParens :: String -> InfiniteList
parseParens ('(':xs) =
    if remainder == ""
        then result
        else error "Unbalanced parenthesis"
    where (result, remainder) = parseToClose EmptyList xs
parseParens _ = error "Unbalanced parenthesis"

parseToClose :: InfiniteList -> String -> (InfiniteList, String)
parseToClose acc "" = error "Unbalanced parenthesis!"
parseToClose acc (')':xs) = (acc, xs)
parseToClose acc ('(':xs) = parseToClose (concatInfLists acc (Cons result EmptyList)) remainder
    where (result, remainder) = parseToClose EmptyList xs

concatInfLists :: InfiniteList -> InfiniteList -> InfiniteList
concatInfLists EmptyList ys = ys
concatInfLists (Cons x xs) ys = Cons x (concatInfLists xs ys)

Working like so:

parseParens "()" --> EmptyList
parseParens "(())" --> Cons EmptyList EmptyList
parseParens "((()()))" --> Cons (Cons EmptyList (Cons EmptyList EmptyList)) EmptyList

How to improve?

There surely must be a better way to do this. Perhaps there's even a way to use the built-in List data type for this?

like image 206
Rene Saarsoo Avatar asked Jan 30 '23 02:01

Rene Saarsoo


1 Answers

Edit: Fixed my mischaracterization of Benjamin's answer.

While the answer in @Benjamin Hodgson's comment:

data Nested a = Flat a | Nested (Nested [a]) deriving (Show)

gives a good way to represent a homogeneous list of arbitrary nesting depth (i.e., sort of like a sum type of [a] plus [[a]] plus [[[a]]] plus all the rest), it seems like an unusual representation for your problem, particularly in a case like:

parseParens "(()(()))"

where the nesting depth of the "child nodes" differs. This would be represented as:

Nested (Nested (Nested (Flat [[],[[]]]))) :: Nested a

so it literally allows you to represent the result of the parse as the desired list, given enough Nested constructors, but it has some odd properties. For example, the innermost empty lists actually have different types: the first is of type [[a]] while the second is of type [a].

As a alternative approach, I think the data type you actually want is probably just:

data Nested = N [Nested] deriving (Show)

where each node N is a (possibly empty) list of nodes. Then, you'll get:

> parseParens "()"
N []
> parseParens "(())"
N [N []]
> parseParens "((()()))"
N [N [N [],N []]]
> parseParens "(()(()))"
N [N [],N [N []]]

If you just ignore the N constructors in these results, the first three of these match your "corresponding list" test cases from the beginning of your question.

As a side note: the Nested data type above is actually a "rose tree" containing no data, equivalent to Tree () using the Tree data type from Data.Tree in the containers package.

Finally, I can't emphasize enough how helpful it is to learn and use a monadic parsing library, even for simple parsing jobs. Using the parsec library, for example, you can write a parser for your grammar in one line:

nested = N <$> between (char '(') (char ')') (many nested)

My full code for parseParens is:

import Data.Tree
import Text.Parsec
import Text.Parsec.String

data Nested = N [Nested] deriving (Show)

nested :: Parser Nested
nested = N <$> between (char '(') (char ')') (many nested)

parseParens :: String -> Nested
parseParens str =
  let Right result = parse (nested <* eof) "" str
  in  result
like image 65
K. A. Buhr Avatar answered Feb 08 '23 23:02

K. A. Buhr