I'm trying to understand the Forward()
element from pyparsing. Suppose I have this simple BNF:
identifier =
"a..z,$,_" < "a..z,$,_,0..9" >
package_name =
identifier
/ ( package_name "." identifier )
and I try to parse a simple package like java.lang.String
I get either just java
as result or never return from recursion.
I tried it like this:
from pyparsing import alphas,alphanums, Word, Forward, ZeroOrMore, Group, Literal
identifier=Word(alphas+"$_",alphanums+"$_")
dot=Literal(".")
package_name = Forward()
definition = package_name+dot+identifier
package_name << Group(identifier+ZeroOrMore(definition))
package_name.parseString("java.lang.String")
will print [['java']]
from pyparsing import alphas,alphanums, Word, Forward, ZeroOrMore, Group, Literal
identifier=Word(alphas+"$_",alphanums+"$_")
dot=Literal(".")
package_name = Forward()
definition = identifier^package_name+dot+identifier
package_name << definition
package_name.parseString("java.lang.String")
will reach recursion limit
how does this Forward
placeholder work?
The problem is not with Forward
but with your grammar, which is inherently either limited too early, or recursive in a way that is undecidable with a naive recursive descent parser like Pyparsing.
You have this:
package_name = identifier | (package_name "." identifier )
If you match left to right, this will always match a single identifier and then stop, without attempting to match a following period. If you switch the order to match the identifier
last:
package_name = (package_name "." identifier) | identifier
. . . then it will infinitely recurse, because in order to decide if package_name
matches, the first thing it has to do is decide whether package_name
matches. This is a left-recursive grammar, which a simple recursive-descent parser like Pyparsing can't handle. Pyparsing does not look ahead to see how a match will affect subsequent matches. It just tries the matches left to right.
You can get a simple example of how Forward
works by changing the way your grammar recurses:
identifier = pyp.Word(pyp.alphas+"$_", pyp.alphanums+"$_")
package_name = pyp.Forward()
package_name << ((identifier + '.' + package_name) | identifier)
>>> package_name.parseString("java.lang.String")
[u'java', u'.', u'lang', u'.', u'String'], {})
Here, the recursion happens on the right, not on the left, so Pyparsing can match it incremenetally.
(Your use of ZeroOrMore is a red herring. If you're going to have a recursive grammar like this, you don't want to use ZeroOrMore, because the recursive definition already allows your sub-expression to match multiple times. As I suggested in my comment, though, it is much simpler to define this sort of grammar without recursion anyway.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With