As a simplified subproblem of a parser for a real language, I am trying to implement a parser for expressions of a fictional language which looks similar to standard imperative languages (like Python, JavaScript, and so). Its syntax features the following construct:
[a-zA-Z]+
)+
and *
and parenthesis.
(eg foo.bar.buz
)(1, foo, bar.buz)
) (to remove ambiguity one-tuples are written as (x,)
)foo(1, bar, buz())
)foo()()
is legal because foo()
might return a function)So a fairly complex program in this language is
(1+2*3, f(4,5,6)(bar) + qux.quux()().quuux)
the associativity is supposed to be
( (1+(2*3)), ( ((f(4,5,6))(bar)) + ((((qux.quux)())()).quuux) ) )
I'm currently using the very nice uu-parsinglib
an applicative parser combinator library.
The first problem was obviously that the intuitive expression grammar (expr -> identifier | number | expr * expr | expr + expr | (expr)
is left-recursive. But I could solve that problem using the the pChainl
combinator (see parseExpr
in the example below).
The remaining problem (hence this question) is function application with functions returned from other functions (f()()
). Again, the grammar is left recursive expr -> fun-call | ...; fun-call -> expr ( parameter-list )
. Any ideas how I can solve this problem elegantly using uu-parsinglib
? (the problem should directly apply to parsec
, attoparsec
and other parser combinators as well I guess).
See below my current version of the program. It works well but function application is only working on identifiers to remove the left-recursion:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE RankNTypes #-}
module TestExprGrammar
(
) where
import Data.Foldable (asum)
import Data.List (intercalate)
import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.Utils
import Text.ParserCombinators.UU.BasicInstances
data Node =
NumberLiteral Integer
| Identifier String
| Tuple [Node]
| MemberAccess Node Node
| FunctionCall Node [Node]
| BinaryOperation String Node Node
parseFunctionCall :: Parser Node
parseFunctionCall =
FunctionCall <$>
parseIdentifier {- `parseExpr' would be correct but left-recursive -}
<*> parseParenthesisedNodeList 0
operators :: [[(Char, Node -> Node -> Node)]]
operators = [ [('+', BinaryOperation "+")]
, [('*' , BinaryOperation "*")]
, [('.', MemberAccess)]
]
samePrio :: [(Char, Node -> Node -> Node)] -> Parser (Node -> Node -> Node)
samePrio ops = asum [op <$ pSym c <* pSpaces | (c, op) <- ops]
parseExpr :: Parser Node
parseExpr =
foldr pChainl
(parseIdentifier
<|> parseNumber
<|> parseTuple
<|> parseFunctionCall
<|> pParens parseExpr
)
(map samePrio operators)
parseNodeList :: Int -> Parser [Node]
parseNodeList n =
case n of
_ | n < 0 -> parseNodeList 0
0 -> pListSep (pSymbol ",") parseExpr
n -> (:) <$>
parseExpr
<* pSymbol ","
<*> parseNodeList (n-1)
parseParenthesisedNodeList :: Int -> Parser [Node]
parseParenthesisedNodeList n = pParens (parseNodeList n)
parseIdentifier :: Parser Node
parseIdentifier = Identifier <$> pSome pLetter <* pSpaces
parseNumber :: Parser Node
parseNumber = NumberLiteral <$> pNatural
parseTuple :: Parser Node
parseTuple =
Tuple <$> parseParenthesisedNodeList 1
<|> Tuple [] <$ pSymbol "()"
instance Show Node where
show n =
let showNodeList ns = intercalate ", " (map show ns)
showParenthesisedNodeList ns = "(" ++ showNodeList ns ++ ")"
in case n of
Identifier i -> i
Tuple ns -> showParenthesisedNodeList ns
NumberLiteral n -> show n
FunctionCall f args -> show f ++ showParenthesisedNodeList args
MemberAccess f g -> show f ++ "." ++ show g
BinaryOperation op l r -> "(" ++ show l ++ op ++ show r ++ ")"
With right recursion, no reduction takes place until the entire list of elements has been read; with left recursion, a reduction takes place as each new list element is encountered. Left recursion can therefore save a lot of stack space.
It should be clear that such a recursive call will never terminate. Hence a recursive descent parser cannot be written for a grammar which contains such directly (or indirectly) left recursive rules; in fact, the grammar cannot be LL(1) in the presence of such rules.
Left recursion can be detected structurally, so PEGs with left-recursive rules can be simply rejected by PEG implementations instead of leading to parsers that do not terminate, but the lack of support for left recursion is a restriction on the expressiveness of PEGs.
A Grammar G (V, T, P, S) is left recursive if it has a production in the form. A → A α |β. Left Recursion can be eliminated by introducing new non-terminal A such that. This type of recursion is also called Immediate Left Recursion.
Looking briefly at the list-like combinators for uu-parsinglib
(I'm more familiar with parsec
), I think you can solve this by folding over the result of the pSome
combinator:
parseFunctionCall :: Parser Node
parseFunctionCall =
foldl' FunctionCall <$>
parseIdentifier {- `parseExpr' would be correct but left-recursive -}
<*> pSome (parseParenthesisedNodeList 0)
This is also equivalent to the Alternative
some
combinator, which should indeed apply to the other parsing libs you mentioned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With