Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parsec using between to parse parens

If I wanted to parse a string with multiple parenthesized groups into a list of strings holding each group, for example

"((a b c) a b c)"

into

["((a b c) a b c)","( a b c)"]

How would I do that using parsec? The use of between looks nice but it does not seem possible to separate with a beginning and end value.

like image 937
user2150839 Avatar asked Apr 24 '13 03:04

user2150839


2 Answers

I'd use a recursive parser:

data Expr = List [Expr] | Term String

expr :: Parsec String () Expr
expr = recurse <|> terminal

where terminal is your primitives, in this case these seem to be strings of characters so

 where terminal = Term <$> many1 letter

and recurse is

       recurse  = List <$>
                  (between `on` char) '(' ')' (expr `sepBy1` char ' ')

Now we have a nice tree of Exprs which we can gather with

collect r@(List ts) = r : concatMap collect ts
collect _           = []
like image 194
Daniel Gratzer Avatar answered Sep 30 '22 17:09

Daniel Gratzer


While jozefg's solution is almost identical to what I came up with (and I completely agree to all his suggestions), there are some small differences that made me think that I should post a second answer:

  1. Due to the expected result of the initial example, it is not necessary to treat space-separated parts as individual subtrees.
  2. Further it might be interesting to see the part that actually computes the expected results (i.e., a list of strings).

So here is my version. As already suggested by jozefg, split the task into several sub-tasks. Those are:

  1. Parse a string into an algebraic data type representing some kind of tree.
  2. Collect the (desired) subtrees of this tree.
  3. Turn trees into strings.

Concerning 1, we first need a tree data type

import Text.Parsec
import Text.Parsec.String
import Control.Applicative ((<$>))

data Tree = Leaf String | Node [Tree]

and then a function that can parse strings into values of this type.

parseTree :: Parser Tree
parseTree = node <|> leaf
  where
    node = Node <$> between (char '(') (char ')') (many parseTree)
    leaf = Leaf <$> many1 (noneOf "()")

In my version I do consider the hole string between parenthesis as a Leaf node (i.e., I do not split at white-spaces).

Now we need to collect the subtrees of a tree we are interested in:

nodes :: Tree -> [Tree]
nodes (Leaf _) = []
nodes t@(Node ts) = t : concatMap nodes ts

Finally, a Show-instance for Trees allows us to turn them into strings.

instance Show Tree where
  showsPrec d (Leaf x) = showString x
  showsPrec d (Node xs) = showString "(" . showList xs . showString ")"
    where
      showList [] = id
      showList (x:xs) = shows x . showList xs

Then the original task can be solved, e.g., by:

parseGroups :: Parser [String]
parseGroups = map show . nodes <$> parseTree

> parseTest parseGroups "((a b c) a b c)"
["((a b c) a b c)","(a b c)"]
like image 23
chris Avatar answered Sep 30 '22 16:09

chris