I am trying to convert ant ANTLR3 grammar to an ANTLR4 grammar, in order to use it with the antlr4-python2-runtime. This grammar is a C/C++ fuzzy parser.
After converting it (basically removing tree operators and semantic/syntactic predicates), I generated the Python2 files using:
java -jar antlr4.5-complete.jar -Dlanguage=Python2 CPPGrammar.g4
And the code is generated without any error, so I import it in my python project (I'm using PyCharm) to make some tests:
import sys, time
from antlr4 import *
from parser.CPPGrammarLexer import CPPGrammarLexer
from parser.CPPGrammarParser import CPPGrammarParser
currenttimemillis = lambda: int(round(time.time() * 1000))
def is_string(object):
return isinstance(object,str)
def parsecommandstringline(argv):
if(2!=len(argv)):
raise IndexError("Invalid args size.")
if(is_string(argv[1])):
return True
else:
raise TypeError("Argument must be str type.")
def doparsing(argv):
if parsecommandstringline(argv):
print("Arguments: OK - {0}".format(argv[1]))
input = FileStream(argv[1])
lexer = CPPGrammarLexer(input)
stream = CommonTokenStream(lexer)
parser = CPPGrammarParser(stream)
print("*** Parser: START ***")
start = currenttimemillis()
tree = parser.code()
print("*** Parser: END *** - {0} ms.".format(currenttimemillis()-start))
pass
def main(argv):
tree = doparsing(argv)
pass
if __name__ == '__main__':
main(sys.argv)
The problem is that the parsing is very slow. With a file containing ~200 lines it takes more than 5 minutes to complete, while the parsing of the same file in antlrworks only takes 1-2 seconds.
Analyzing the antlrworks tree, I noticed that the expr
rule and all of its descendants are called very often and I think that I need to simplify/change these rules to make the parser operate faster:
Is my assumption correct or did I make some mistake while converting the grammar? What can be done to make parsing as fast as on antlrworks?
UPDATE:
I exported the same grammar to Java and it only took 795ms to complete the parsing. The problem seems more related to python implementation than to the grammar itself. Is there anything that can be done to speed up Python parsing?
I've read here that python can be 20-30 times slower than java, but in my case python is ~400 times slower!
Posting here since it may be useful to people that find this thread.
Since this was posted, there have been several performance improvements to Antlr's Python target. That said, the Python interpreter will be intrinsically slower than Java or other compiled languages.
I've put together a Python accelerator code generator for Antlr's Python3 target. It uses Antlr C++ target as a Python extension. Lexing & parsing is done exclusively in C++, and then an auto-generated visitor is used to re-build the resulting parse tree in Python. Initial tests show a 5x-25x speedup depending on the grammar and input, and I have a few ideas on how to improve it further.
Here is the code-generator tool: https://github.com/amykyta3/speedy-antlr-tool
And here is a fully-functional example: https://github.com/amykyta3/speedy-antlr-example
Hope this is useful to those who prefer using Antlr in Python!
I confirm that the Python 2 and Python 3 runtimes have performance issues. With a few patches, I got a 10x speedup on the python3 runtime (~5 seconds down to ~400 ms). https://github.com/antlr/antlr4/pull/1010
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With