Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing `read` for a left-associative tree in Haskell

Tags:

haskell

I'm having a hard time implementing Read for a tree structure. I want to take a left-associative string (with parens) like ABC(DE)F and convert it into a tree. That particular example corresponds to the tree

tree.

Here's the data type I'm using (though I'm open to suggestions):

data Tree = Branch Tree Tree | Leaf Char deriving (Eq)

That particular tree would be, in Haskell:

example = Branch (Branch (Branch (Branch (Leaf 'A')
                                         (Leaf 'B'))
                                 (Leaf 'C'))
                         (Branch (Leaf 'D')
                                 (Leaf 'E')))
                 (Leaf 'F')

My show function looks like:

instance Show Tree where
    show (Branch l r@(Branch _ _)) = show l ++ "(" ++ show r ++ ")"
    show (Branch l r) = show l ++ show r
    show (Leaf x) = [x]

I want to make a read function so that

read "ABC(DE)F" == example
like image 666
Snowball Avatar asked May 02 '12 05:05

Snowball


3 Answers

This is a situation where using a parsing library makes the code amazingly short and extremely expressive. (I was amazed that it was so neat when I was experimenting to answer this!)

I'm going to use Parsec (that article provides some links for more information), and using it in "applicative mode" (rather than monadic), since we don't need the extra power/foot-shooting-ability of monads.

Code

First the various imports and definitions:

import Text.Parsec

import Control.Applicative ((<*), (<$>))

data Tree = Branch Tree Tree | Leaf Char deriving (Eq, Show)

paren, tree, unit :: Parsec String st Tree

Now, the basic unit of the tree is either a single character (that's not a parenthesis) or a parenthesised tree. The parenthesised tree is just a normal tree between ( and ). And a normal tree is just units put into branches left-associatedly (it's extremely self-recursive). In Haskell with Parsec:

-- parenthesised tree or `Leaf <character>`
unit = paren <|> (Leaf <$> noneOf "()") <?> "group or literal"

-- normal tree between ( and )
paren = between (char '(') (char ')') tree  

-- all the units connected up left-associatedly
tree = foldl1 Branch <$> many1 unit

-- attempt to parse the whole input (don't short-circuit on the first error)
onlyTree = tree <* eof

(Yes, that's the entire parser!)

If we wanted to, we could do without paren and unit but the code above is very expressive, so we can leave it as is.

As a brief explanation (I've provided links to the documentation):

  • (<|>) basically means "left parser or right parser";
  • (<?>) allows you to make nicer error messages;
  • noneOf will parse anything that's not in the given list of characters;
  • between takes three parsers, and returns the value of the third parser as long as it is delimited by the first and second ones;
  • char parses its argument literally.
  • many1 parses one or more of its argument into a list (it appears that the empty string is invalid hence many1, rather than many which parses zero or more);
  • eof matches the end of the input.

We can use the parse function to run the parser (it returns Either ParseError Tree, Left is an error and Right is a correct parse).

As read

Using it as a read like function could be something like:

read' str = case parse onlyTree "" str of
   Right tr -> tr
   Left er -> error (show er)

(I've used read' to avoid conflicting with Prelude.read; if you want a Read instance you'll have to do a bit more work to implement readPrec (or whatever is required) but it shouldn't be too hard with the actual parsing already complete.)

Examples

Some basic examples:

*Tree> read' "A"
Leaf 'A'

*Tree> read' "AB"
Branch (Leaf 'A') (Leaf 'B')

*Tree> read' "ABC"
Branch (Branch (Leaf 'A') (Leaf 'B')) (Leaf 'C')

*Tree> read' "A(BC)"
Branch (Leaf 'A') (Branch (Leaf 'B') (Leaf 'C'))

*Tree> read' "ABC(DE)F" == example
True

*Tree> read' "ABC(DEF)" == example
False

*Tree> read' "ABCDEF" == example
False

Demonstrating errors:

*Tree> read' ""
***Exception: (line 1, column 1):
unexpected end of input
expecting group or literal

*Tree> read' "A(B"
***Exception: (line 1, column 4):
unexpected end of input
expecting group or literal or ")"

And finally, the difference between tree and onlyTree:

*Tree> parse tree "" "AB)CD"     -- success: ignores ")CD"
Right (Branch (Leaf 'A') (Leaf 'B'))

*Tree> parse onlyTree "" "AB)CD" -- fail: can't parse the ")"
Left (line 1, column 3):
unexpected ')'
expecting group or literal or end of input

Conclusion

Parsec is amazing! This answer might be long but the core of it is just 5 or 6 lines of code which do all the work.

like image 72
huon Avatar answered Nov 13 '22 17:11

huon


This very-much looks like a stack structure. When you encounter your input string "ABC(DE)F", you Leaf any atom you find (non-parenthesis) and put it in an accumulator list. When you have 2 items in the list, you Branch them together. This could be done with something like (note, untested, just including to give an idea):

read' [r,l] str  = read' [Branch l r] str
read' acc (c:cs) 
   -- read the inner parenthesis
   | c == '('  = let (result, rest) = read' [] cs 
                 in read' (result : acc) rest
   -- close parenthesis, return result, should be singleton
   | c == ')'  = (acc, cs) 
   -- otherwise, add a leaf
   | otherwise = read' (Leaf c : acc) cs
read' [result] [] = (result, [])
read' _ _  = error "invalid input"

This may require some modification, but I think its enough to get you on the right track.

like image 27
ScottWest Avatar answered Nov 13 '22 15:11

ScottWest


The parsec answer by dbaupp is very easy to understand. As an example of a "low-level" approach, here is a hand written parser which uses a success continuation to handle the left-associative tree building:

instance Read Tree where readsPrec _prec s = maybeToList (readTree s)

type TreeCont = (Tree,String) -> Maybe (Tree,String)

readTree :: String -> Maybe (Tree,String)
readTree = read'top Just where
  valid ')' = False
  valid '(' = False
  valid _ = True

  read'top :: TreeCont -> String -> Maybe (Tree,String)
  read'top acc s@(x:ys) | valid x =
    case ys of
      [] -> acc (Leaf x,[])
      (y:zs) -> read'branch acc s
  read'top _ _ = Nothing

  -- The next three are mutually recursive

  read'branch :: TreeCont -> String -> Maybe (Tree,String)
  read'branch acc (x:y:zs) | valid x = read'right (combine (Leaf x) >=> acc) y zs
  read'branch _ _ = Nothing

  read'right :: TreeCont -> Char -> String -> Maybe (Tree,String)
  read'right acc y ys | valid y = acc (Leaf y,ys)
  read'right acc '(' ys = read'branch (drop'close >=> acc) ys
     where drop'close (b,')':zs) = Just (b,zs)
           drop'close _ = Nothing
  read'right _ _ _ = Nothing  -- assert y==')' here

  combine :: Tree -> TreeCont
  combine build (t, []) = Just (Branch build t,"")
  combine build (t, ys@(')':_)) = Just (Branch build t,ys)  -- stop when lookahead shows ')'
  combine build (t, y:zs) = read'right (combine (Branch build t)) y zs
like image 40
Chris Kuklewicz Avatar answered Nov 13 '22 15:11

Chris Kuklewicz