Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing an expression grammar having function application with parser combinators (left-recursion)

As a simplified subproblem of a parser for a real language, I am trying to implement a parser for expressions of a fictional language which looks similar to standard imperative languages (like Python, JavaScript, and so). Its syntax features the following construct:

  • integer numbers
  • identifiers ([a-zA-Z]+)
  • arithmetic expressions with + and * and parenthesis
  • structure access with . (eg foo.bar.buz)
  • tuples (eg (1, foo, bar.buz)) (to remove ambiguity one-tuples are written as (x,))
  • function application (eg foo(1, bar, buz()))
  • functions are first class so they can also be returned from other functions and directly be applied (eg foo()() is legal because foo() might return a function)

So a fairly complex program in this language is

(1+2*3, f(4,5,6)(bar) + qux.quux()().quuux)

the associativity is supposed to be

( (1+(2*3)), ( ((f(4,5,6))(bar)) + ((((qux.quux)())()).quuux) ) )

I'm currently using the very nice uu-parsinglib an applicative parser combinator library.

The first problem was obviously that the intuitive expression grammar (expr -> identifier | number | expr * expr | expr + expr | (expr) is left-recursive. But I could solve that problem using the the pChainl combinator (see parseExpr in the example below).

The remaining problem (hence this question) is function application with functions returned from other functions (f()()). Again, the grammar is left recursive expr -> fun-call | ...; fun-call -> expr ( parameter-list ). Any ideas how I can solve this problem elegantly using uu-parsinglib? (the problem should directly apply to parsec, attoparsec and other parser combinators as well I guess).

See below my current version of the program. It works well but function application is only working on identifiers to remove the left-recursion:

 {-# LANGUAGE FlexibleContexts #-}
 {-# LANGUAGE RankNTypes #-}

 module TestExprGrammar
     (
     ) where

 import Data.Foldable (asum)
 import Data.List (intercalate)
 import Text.ParserCombinators.UU
 import Text.ParserCombinators.UU.Utils
 import Text.ParserCombinators.UU.BasicInstances

 data Node =
     NumberLiteral Integer
     | Identifier String
     | Tuple [Node]
     | MemberAccess Node Node
     | FunctionCall Node [Node]
     | BinaryOperation String Node Node

 parseFunctionCall :: Parser Node
 parseFunctionCall =
     FunctionCall <$>
         parseIdentifier {- `parseExpr' would be correct but left-recursive -}
         <*> parseParenthesisedNodeList 0

 operators :: [[(Char, Node -> Node -> Node)]]
 operators = [ [('+', BinaryOperation "+")]
             , [('*' , BinaryOperation "*")]
             , [('.', MemberAccess)]
             ]

 samePrio :: [(Char, Node -> Node -> Node)] -> Parser (Node -> Node -> Node)
 samePrio ops = asum [op <$ pSym c <* pSpaces | (c, op) <- ops]

 parseExpr :: Parser Node
 parseExpr =
     foldr pChainl
           (parseIdentifier
           <|> parseNumber
           <|> parseTuple
           <|> parseFunctionCall
           <|> pParens parseExpr
           )
           (map samePrio operators)

 parseNodeList :: Int -> Parser [Node]
 parseNodeList n =
     case n of
       _ | n < 0 -> parseNodeList 0
       0 -> pListSep (pSymbol ",") parseExpr
       n -> (:) <$>
           parseExpr
           <* pSymbol ","
           <*> parseNodeList (n-1)

 parseParenthesisedNodeList :: Int -> Parser [Node]
 parseParenthesisedNodeList n = pParens (parseNodeList n)

 parseIdentifier :: Parser Node
 parseIdentifier = Identifier <$> pSome pLetter <* pSpaces

 parseNumber :: Parser Node
 parseNumber = NumberLiteral <$> pNatural

 parseTuple :: Parser Node
 parseTuple =
     Tuple <$> parseParenthesisedNodeList 1
     <|> Tuple [] <$ pSymbol "()"

 instance Show Node where
     show n =
         let showNodeList ns = intercalate ", " (map show ns)
             showParenthesisedNodeList ns = "(" ++ showNodeList ns ++ ")"
         in case n of
              Identifier i -> i
              Tuple ns -> showParenthesisedNodeList ns
              NumberLiteral n -> show n
              FunctionCall f args -> show f ++ showParenthesisedNodeList args
              MemberAccess f g -> show f ++ "." ++ show g
              BinaryOperation op l r -> "(" ++ show l ++ op ++ show r ++ ")"
like image 899
Johannes Weiss Avatar asked Oct 04 '14 23:10

Johannes Weiss


People also ask

What is left recursive and right recursive grammar?

With right recursion, no reduction takes place until the entire list of elements has been read; with left recursion, a reduction takes place as each new list element is encountered. Left recursion can therefore save a lot of stack space.

Can recursive descent parsers used for left recursive grammars?

It should be clear that such a recursive call will never terminate. Hence a recursive descent parser cannot be written for a grammar which contains such directly (or indirectly) left recursive rules; in fact, the grammar cannot be LL(1) in the presence of such rules.

How left recursion affect the parsing process?

Left recursion can be detected structurally, so PEGs with left-recursive rules can be simply rejected by PEG implementations instead of leading to parsers that do not terminate, but the lack of support for left recursion is a restriction on the expressiveness of PEGs.

How do you check if a grammar is left recursive?

A Grammar G (V, T, P, S) is left recursive if it has a production in the form. A → A α |β. Left Recursion can be eliminated by introducing new non-terminal A such that. This type of recursion is also called Immediate Left Recursion.


1 Answers

Looking briefly at the list-like combinators for uu-parsinglib (I'm more familiar with parsec), I think you can solve this by folding over the result of the pSome combinator:

 parseFunctionCall :: Parser Node
 parseFunctionCall =
     foldl' FunctionCall <$>
         parseIdentifier {- `parseExpr' would be correct but left-recursive -}
         <*> pSome (parseParenthesisedNodeList 0)

This is also equivalent to the Alternative some combinator, which should indeed apply to the other parsing libs you mentioned.

like image 200
Ørjan Johansen Avatar answered Sep 27 '22 18:09

Ørjan Johansen