I am using a parser grammar and a lexer grammar for antlr4 from GitHub to parse PHP in Python3.
When I use these grammars directly my PoC code works:
antlr-test.py
from antlr4 import *
# from PHPParentLexer import PHPParentLexer
# from PHPParentParser import PHPParentParser
# from PHPParentParser import PHPParentListener
from PHPLexer import PHPLexer as PHPParentLexer
from PHPParser import PHPParser as PHPParentParser
from PHPParser import PHPParserListener as PHPParentListener
class PhpGrammarListener(PHPParentListener):
def enterFunctionInvocation(self, ctx):
print("enterFunctionInvocation " + ctx.getText())
if __name__ == "__main__":
scanner_input = FileStream('test.php')
lexer = PHPParentLexer(scanner_input)
stream = CommonTokenStream(lexer)
parser = PHPParentParser(stream)
tree = parser.htmlDocument()
walker = ParseTreeWalker()
printer = PhpGrammarListener()
walker.walk(printer, tree)
which gives the output
/opt/local/bin/python3.4 /Users/d/PycharmProjects/name/antlr-test.py
enterFunctionInvocation echo("hi")
enterFunctionInvocation another_method("String")
enterFunctionInvocation print("print statement")
Process finished with exit code 0
When I use the following PHPParent.g4 grammar, I get a lot of errors:
grammar PHPParent;
options { tokenVocab=PHPLexer; }
import PHPParser;
After swapping comments on pythons imports, I get this error
/opt/local/bin/python3.4 /Users/d/PycharmProjects/name/antlr-test.py
line 1:1 token recognition error at: '?'
line 1:2 token recognition error at: 'p'
line 1:3 token recognition error at: 'h'
line 1:4 token recognition error at: 'p'
line 1:5 token recognition error at: '\n'
...
line 2:8 no viable alternative at input '<('
line 2:14 mismatched input ';' expecting {<EOF>, '<', '{', '}', ')', '?>', 'list', 'global', 'continue', 'return', 'class', 'do', 'switch', 'function', 'break', 'if', 'for', 'foreach', 'while', 'new', 'clone', '&', '!', '-', '~', '@', '$', <INVALID>, 'Interface', 'abstract', 'static', Array, RequireOperator, DecimalNumber, HexNumber, OctalNumber, Float, Boolean, SingleQuotedString, DoubleQuotedString_Start, Identifier, IncrementOperator}
line 3:28 mismatched input ';' expecting {<EOF>, '<', '{', '}', ')', '?>', 'list', 'global', 'continue', 'return', 'class', 'do', 'switch', 'function', 'break', 'if', 'for', 'foreach', 'while', 'new', 'clone', '&', '!', '-', '~', '@', '$', <INVALID>, 'Interface', 'abstract', 'static', Array, RequireOperator, DecimalNumber, HexNumber, OctalNumber, Float, Boolean, SingleQuotedString, DoubleQuotedString_Start, Identifier, IncrementOperator}
line 4:28 mismatched input ';' expecting {<EOF>, '<', '{', '}', ')', '?>', 'list', 'global', 'continue', 'return', 'class', 'do', 'switch', 'function', 'break', 'if', 'for', 'foreach', 'while', 'new', 'clone', '&', '!', '-', '~', '@', '$', <INVALID>, 'Interface', 'abstract', 'static', Array, RequireOperator, DecimalNumber, HexNumber, OctalNumber, Float, Boolean, SingleQuotedString, DoubleQuotedString_Start, Identifier, IncrementOperator}
However I get no errors when running the antlr4 tool over the grammars. I'm stumped here - what could be causing this issue?
$ a4p PHPLexer.g4
warning(146): PHPLexer.g4:363:0: non-fragment lexer rule DoubleQuotedStringBody can match the empty string
$ a4p PHPParser.g4
warning(154): PHPParser.g4:523:0: rule doubleQuotedString contains an optional block with at least one alternative that can match an empty string
$ a4p PHPParent.g4
warning(154): PHPParent.g4:523:0: rule doubleQuotedString contains an optional block with at least one alternative that can match an empty string
Import is ANTLR4 is kind of messy.
First, tokenVocab
can not generate the lexer you need. It just means that this grammar is using the tokens of PHPLexer
. If you delete PHPLexer.tokens
, it won't even compile!
Take a look at PHPParser.g4
where we also use options { tokenVocab=PHPLexer; }
. Yet in the python script we still need to use lexer from PHPLexer
to make it work. Well, this PHPParentLexer
is not useable at all. That's why you got all the error.
To generate a new lexer out of combined grammar, you need to import it like this:
grammar PHPParent;
import PHPLexer;
However, mode
is not supported when importing. PHPLexer
itself uses mode
a lot. So it's also not an option.
Can we simply replace PHPParentLexer
with PHPLexer
? Sadly, no. Because PHPParentParser
is generated with PHPParentLexer
, they are tightly coupled and can not be used seperatly. If you use PHPLexer
, PHPParentParser
also won't work. As for this grammar, thanks to the error recovery, it actually works, but gives some error.
There seems to be no better way but to rewrite some of the grammar. There are definitely some design issues in this import
part of ANTLR4.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With