As a simplified subproblem of a parser for a real language, I am trying to implement a parser for expressions of a fictional language which looks similar to standard imperative languages (like Python, JavaScript, and so). Its syntax features the following construct: <ul> <li>integer numbers</li> <li>identifiers (<code>[a-zA-Z]+</code>)</li> <li>arithmetic expressions with <code>+</code> and <code>*</code> and parenthesis</li> <li>structure access with <code>.</code> (eg <code>foo.bar.buz</code>)</li> <li>tuples (eg <code>(1, foo, bar.buz)</code>) (to remove ambiguity one-tuples are written as <code>(x,)</code>)</li> <li>function application (eg <code>foo(1, bar, buz())</code>)</li> <li>functions are first class so they can also be returned from other functions and directly be applied (eg <code>foo()()</code> is legal because <code>foo()</code> might return a function)</li> </ul> So a fairly complex program in this language is <pre class="prettyprint"><code>(1+2*3, f(4,5,6)(bar) + qux.quux()().quuux) </code></pre> the associativity is supposed to be <pre class="prettyprint"><code>( (1+(2*3)), ( ((f(4,5,6))(bar)) + ((((qux.quux)())()).quuux) ) ) </code></pre> I'm currently using the very nice <code>uu-parsinglib</code> an applicative parser combinator library. The first problem was obviously that the intuitive expression grammar (<code>expr -> identifier | number | expr * expr | expr + expr | (expr)</code> is left-recursive. But I could solve that problem using the the <code>pChainl</code> combinator (see <code>parseExpr</code> in the example below). The remaining problem (hence this question) is function application with functions returned from other functions (<code>f()()</code>). Again, the grammar is left recursive <code>expr -> fun-call | ...; fun-call -> expr ( parameter-list )</code>. Any ideas how I can solve this problem elegantly using <code>uu-parsinglib</code>? (the problem should directly apply to <code>parsec</code>, <code>attoparsec</code> and other parser combinators as well I guess). See below my current version of the program. It works well but function application is only working on identifiers to remove the left-recursion: <pre class="prettyprint"><code> {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE RankNTypes #-} module TestExprGrammar ( ) where import Data.Foldable (asum) import Data.List (intercalate) import Text.ParserCombinators.UU import Text.ParserCombinators.UU.Utils import Text.ParserCombinators.UU.BasicInstances data Node = NumberLiteral Integer | Identifier String | Tuple [Node] | MemberAccess Node Node | FunctionCall Node [Node] | BinaryOperation String Node Node parseFunctionCall :: Parser Node parseFunctionCall = FunctionCall <$> parseIdentifier {- `parseExpr' would be correct but left-recursive -} <*> parseParenthesisedNodeList 0 operators :: [[(Char, Node -> Node -> Node)]] operators = [ [('+', BinaryOperation "+")] , [('*' , BinaryOperation "*")] , [('.', MemberAccess)] ] samePrio :: [(Char, Node -> Node -> Node)] -> Parser (Node -> Node -> Node) samePrio ops = asum [op <$ pSym c <* pSpaces | (c, op) <- ops] parseExpr :: Parser Node parseExpr = foldr pChainl (parseIdentifier <|> parseNumber <|> parseTuple <|> parseFunctionCall <|> pParens parseExpr ) (map samePrio operators) parseNodeList :: Int -> Parser [Node] parseNodeList n = case n of _ | n < 0 -> parseNodeList 0 0 -> pListSep (pSymbol ",") parseExpr n -> (:) <$> parseExpr <* pSymbol "," <*> parseNodeList (n-1) parseParenthesisedNodeList :: Int -> Parser [Node] parseParenthesisedNodeList n = pParens (parseNodeList n) parseIdentifier :: Parser Node parseIdentifier = Identifier <$> pSome pLetter <* pSpaces parseNumber :: Parser Node parseNumber = NumberLiteral <$> pNatural parseTuple :: Parser Node parseTuple = Tuple <$> parseParenthesisedNodeList 1 <|> Tuple [] <$ pSymbol "()" instance Show Node where show n = let showNodeList ns = intercalate ", " (map show ns) showParenthesisedNodeList ns = "(" ++ showNodeList ns ++ ")" in case n of Identifier i -> i Tuple ns -> showParenthesisedNodeList ns NumberLiteral n -> show n FunctionCall f args -> show f ++ showParenthesisedNodeList args MemberAccess f g -> show f ++ "." ++ show g BinaryOperation op l r -> "(" ++ show l ++ op ++ show r ++ ")" </code></pre>

Looking briefly at the list-like combinators for <code>uu-parsinglib</code> (I'm more familiar with <code>parsec</code>), I think you can solve this by folding over the result of the <code>pSome</code> combinator: <pre class="prettyprint"><code> parseFunctionCall :: Parser Node parseFunctionCall = foldl' FunctionCall <$> parseIdentifier {- `parseExpr' would be correct but left-recursive -} <*> pSome (parseParenthesisedNodeList 0) </code></pre> This is also equivalent to the <code>Alternative</code> <code>some</code> combinator, which should indeed apply to the other parsing libs you mentioned.

Parsing an expression grammar having function application with parser combinators (left-recursion)

Tags:

parsing

haskell

recursive-descent

parsec

uu-parsinglib

As a simplified subproblem of a parser for a real language, I am trying to implement a parser for expressions of a fictional language which looks similar to standard imperative languages (like Python, JavaScript, and so). Its syntax features the following construct:

integer numbers
identifiers ([a-zA-Z]+)
arithmetic expressions with + and * and parenthesis
structure access with . (eg foo.bar.buz)
tuples (eg (1, foo, bar.buz)) (to remove ambiguity one-tuples are written as (x,))
function application (eg foo(1, bar, buz()))
functions are first class so they can also be returned from other functions and directly be applied (eg foo()() is legal because foo() might return a function)

So a fairly complex program in this language is

(1+2*3, f(4,5,6)(bar) + qux.quux()().quuux)

the associativity is supposed to be

( (1+(2*3)), ( ((f(4,5,6))(bar)) + ((((qux.quux)())()).quuux) ) )

I'm currently using the very nice uu-parsinglib an applicative parser combinator library.

The first problem was obviously that the intuitive expression grammar (expr -> identifier | number | expr * expr | expr + expr | (expr) is left-recursive. But I could solve that problem using the the pChainl combinator (see parseExpr in the example below).

The remaining problem (hence this question) is function application with functions returned from other functions (f()()). Again, the grammar is left recursive expr -> fun-call | ...; fun-call -> expr ( parameter-list ). Any ideas how I can solve this problem elegantly using uu-parsinglib? (the problem should directly apply to parsec, attoparsec and other parser combinators as well I guess).

See below my current version of the program. It works well but function application is only working on identifiers to remove the left-recursion:

 {-# LANGUAGE FlexibleContexts #-}
 {-# LANGUAGE RankNTypes #-}

 module TestExprGrammar
     (
     ) where

 import Data.Foldable (asum)
 import Data.List (intercalate)
 import Text.ParserCombinators.UU
 import Text.ParserCombinators.UU.Utils
 import Text.ParserCombinators.UU.BasicInstances

 data Node =
     NumberLiteral Integer
     | Identifier String
     | Tuple [Node]
     | MemberAccess Node Node
     | FunctionCall Node [Node]
     | BinaryOperation String Node Node

 parseFunctionCall :: Parser Node
 parseFunctionCall =
     FunctionCall <$>
         parseIdentifier {- `parseExpr' would be correct but left-recursive -}
         <*> parseParenthesisedNodeList 0

 operators :: [[(Char, Node -> Node -> Node)]]
 operators = [ [('+', BinaryOperation "+")]
             , [('*' , BinaryOperation "*")]
             , [('.', MemberAccess)]
             ]

 samePrio :: [(Char, Node -> Node -> Node)] -> Parser (Node -> Node -> Node)
 samePrio ops = asum [op <$ pSym c <* pSpaces | (c, op) <- ops]

 parseExpr :: Parser Node
 parseExpr =
     foldr pChainl
           (parseIdentifier
           <|> parseNumber
           <|> parseTuple
           <|> parseFunctionCall
           <|> pParens parseExpr
           )
           (map samePrio operators)

 parseNodeList :: Int -> Parser [Node]
 parseNodeList n =
     case n of
       _ | n < 0 -> parseNodeList 0
       0 -> pListSep (pSymbol ",") parseExpr
       n -> (:) <$>
           parseExpr
           <* pSymbol ","
           <*> parseNodeList (n-1)

 parseParenthesisedNodeList :: Int -> Parser [Node]
 parseParenthesisedNodeList n = pParens (parseNodeList n)

 parseIdentifier :: Parser Node
 parseIdentifier = Identifier <$> pSome pLetter <* pSpaces

 parseNumber :: Parser Node
 parseNumber = NumberLiteral <$> pNatural

 parseTuple :: Parser Node
 parseTuple =
     Tuple <$> parseParenthesisedNodeList 1
     <|> Tuple [] <$ pSymbol "()"

 instance Show Node where
     show n =
         let showNodeList ns = intercalate ", " (map show ns)
             showParenthesisedNodeList ns = "(" ++ showNodeList ns ++ ")"
         in case n of
              Identifier i -> i
              Tuple ns -> showParenthesisedNodeList ns
              NumberLiteral n -> show n
              FunctionCall f args -> show f ++ showParenthesisedNodeList args
              MemberAccess f g -> show f ++ "." ++ show g
              BinaryOperation op l r -> "(" ++ show l ++ op ++ show r ++ ")"

899

asked Oct 04 '14 23:10

Johannes Weiss

1 Answers

Looking briefly at the list-like combinators for uu-parsinglib (I'm more familiar with parsec), I think you can solve this by folding over the result of the pSome combinator:

 parseFunctionCall :: Parser Node
 parseFunctionCall =
     foldl' FunctionCall <$>
         parseIdentifier {- `parseExpr' would be correct but left-recursive -}
         <*> pSome (parseParenthesisedNodeList 0)

This is also equivalent to the Alternative some combinator, which should indeed apply to the other parsing libs you mentioned.

200

answered Sep 27 '22 18:09

Ørjan Johansen

Related questions
                            
                                Parsing SPARQL queries
                            
                                How to convert a string to a templated type in c++
                            
                                How to grab elements by class or id in HTML Source in C#?
                            
                                What type of grammar is used to parse PostgreSQL?
                            
                                Pull Tag Value using BeautifulSoup
                            
                                Find numeric value of a digit character in C#
                            
                                Better way for parser combinators in C?
                            
                                How to use Python to find all isbn in a text file?
                            
                                Parse Google Protocol Buffers datagram without .proto file?
                            
                                Python - how to read/parse csv like line?
                            
                                haskell - parsing/reading content of .pdf-files
                            
                                Python CSV reader return Row as list
                            
                                Does JSON.parse() use eval() internally? [duplicate]
                            
                                Incorrect Tokenization with Marpa
                            
                                Is there an SCSS parser that outputs an AST? [closed]
                            
                                Capturing string literals with escaped quotes in ANTLR
                            
                                PathGetArgs/PathRemoveArgs vs. CommandLineToArgvW - is there a difference?
                            
                                How to parse reserved words correctly in boost spirit
                            
                                Parsing HTML with CSQuery
                            
                                How to serialize JSON key containing dots (like e.g. IP address) with SuperObject?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With