Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

write recursive Parser with pyparsing

I'm trying to understand the Forward() element from pyparsing. Suppose I have this simple BNF:

identifier  =
  "a..z,$,_"  <  "a..z,$,_,0..9"  >

package_name = 
  identifier 
  /  ( package_name  "." identifier )

and I try to parse a simple package like java.lang.String I get either just java as result or never return from recursion. I tried it like this:

from pyparsing import alphas,alphanums, Word, Forward, ZeroOrMore, Group, Literal

identifier=Word(alphas+"$_",alphanums+"$_")
dot=Literal(".")

package_name = Forward()
definition = package_name+dot+identifier
package_name << Group(identifier+ZeroOrMore(definition))

package_name.parseString("java.lang.String")

will print [['java']]

from pyparsing import alphas,alphanums, Word, Forward, ZeroOrMore, Group, Literal

identifier=Word(alphas+"$_",alphanums+"$_")
dot=Literal(".")

package_name = Forward()
definition = identifier^package_name+dot+identifier
package_name << definition

package_name.parseString("java.lang.String")

will reach recursion limit

how does this Forward placeholder work?

like image 983
Rafael T Avatar asked Jan 19 '13 23:01

Rafael T


1 Answers

The problem is not with Forward but with your grammar, which is inherently either limited too early, or recursive in a way that is undecidable with a naive recursive descent parser like Pyparsing.

You have this:

package_name = identifier | (package_name "." identifier )

If you match left to right, this will always match a single identifier and then stop, without attempting to match a following period. If you switch the order to match the identifier last:

package_name = (package_name "." identifier) | identifier

. . . then it will infinitely recurse, because in order to decide if package_name matches, the first thing it has to do is decide whether package_name matches. This is a left-recursive grammar, which a simple recursive-descent parser like Pyparsing can't handle. Pyparsing does not look ahead to see how a match will affect subsequent matches. It just tries the matches left to right.

You can get a simple example of how Forward works by changing the way your grammar recurses:

identifier  = pyp.Word(pyp.alphas+"$_", pyp.alphanums+"$_")
package_name = pyp.Forward()
package_name << ((identifier + '.' + package_name) | identifier)

>>> package_name.parseString("java.lang.String")
[u'java', u'.', u'lang', u'.', u'String'], {})

Here, the recursion happens on the right, not on the left, so Pyparsing can match it incremenetally.

(Your use of ZeroOrMore is a red herring. If you're going to have a recursive grammar like this, you don't want to use ZeroOrMore, because the recursive definition already allows your sub-expression to match multiple times. As I suggested in my comment, though, it is much simpler to define this sort of grammar without recursion anyway.)

like image 52
BrenBarn Avatar answered Nov 13 '22 17:11

BrenBarn