Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LLVM JIT Parser writing with Bison / Antlr / Packrat / Elkhound /

In the LLVM tutorials there is instruction how to write simple JIT compiler. Unfortunatelly, lexer and parser in this tutorial is written manually. I was thinking, such solution is good for learning purposes but it is not suitable for writing complex, production ready compilers. It seems that GCC and few other "big compilers" are hand written though. But I think, that all these parser generators give a big boost when writing own compiler (especially when youre doing it alone, without team of people).

Is it possible to use any existing parser generator like Bison / Antlr / Packrat / Elkhound etc. together with LLVM to create JIT compiler? I want to be able to "feed" the parser constantly (not once on the beginning) with expressions and compile them in runtime.

Additional I've found a lot of questions about "best, modern" parser generator (like this one: https://stackoverflow.com/questions/428892/what-parser-generator-do-you-recommend). If it is possible to use these tools to create LLVM JIT compiler, I would be thankful for any additional hints and recomendation, which tool would be best in terms of performance and flexibility in this particular case.

like image 844
Wojciech Danilo Avatar asked Feb 06 '13 18:02

Wojciech Danilo


1 Answers

There are a lot of advantages to using a parser generator like bison or antlr, particularly while you're developing a language. You'll undoubtedly end up making changes to the grammar as you go, and you'll want to end up with documentation of the final grammar. Tools which produce a grammar automatically from the documentation are really useful. They also can help give you confidence that the grammar of the language is (a) what you think it is and (b) not ambiguous.

If your language (unlike C++) is actually LALR(1), or even better, LL(1), and you're using LLVM tools to build the AST and IR, then it's unlikely that you will need to do much more than write down the grammar and provide a few simple actions to build the AST. That will keep you going for a while.

The usual reason that people eventually choose to build their own parsers, other than the "real programmers don't use parser generators" prejudice, is that it's not easy to provide good diagnostics for syntax errors, particularly with LR(1) parsing. If that's one of your goals, you should try to make your grammar LL(k) parseable (it's still not easy to provide good diagnostics with LL(k), but it seems to be a little easier) and use an LL(k) framework like Antlr.

There is another strategy, which is to first parse the program text in the simplest possible way using an LALR(1) parser, which is more flexible than LL(1), without even trying to provide diagnostics. If the parse fails, you can then parse it again using a slower, possibly even backtracking parser, which doesn't know how to generate ASTs, but does keep track of source location and try to recover from syntax errors. Recovering from syntax erros without invalidating the AST is even more difficult than just continuing to parse, so there's a lot to be said for not trying. Also, keeping track of source location is really slow, and it's not very useful if you don't have to produce diagnostics (unless you need it for adding debugging annotations), so you can speed the parse up quite a bit by not bothering with location tracking.

Personally, I'm biased against packrat parsing, because it's not clear what the actual language parsed by a PEG is. Other people don't mind this so much, and YMMV.

like image 109
rici Avatar answered Sep 27 '22 21:09

rici