Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Migrating ANTLR v2 grammar to ANTLR v4

Tags:

We have a grammar written for antlr V2 and I would like to migrate to antlr v4. Is there any migration Guide? I would also like to know modifications of existing V2 grammar so that we utilize v4 features well.

like image 723
Loren Cahlander Avatar asked Jul 19 '18 13:07

Loren Cahlander


People also ask

Can ANTLR generate AST?

As we've previously mentioned, ANTLR4 no longer builds an AST for you directly from the grammar. We'll have to write code to do that. We have two options: Produce the very same AST that the ANTLR2 parser produced: same classes, same structure.

Should I use ANTLR?

It's better to use an off-the-shelf parser (generator) such as ANTLR when you want to develop and use a custom language. It's better to write your own parser when your objective is to write a parser. UNLESS you have a lot of experience writing parsers and can get a working parser that way more quickly than using ANTLR.


2 Answers

I solved this by writing a new Antlr 4 grammar file. There is no good transform from Antlr 2 to Antlr 4.

like image 172
Loren Cahlander Avatar answered Sep 28 '22 18:09

Loren Cahlander


nice to meet you again!

We recently migrated a set of large grammars to ANTLR 4 and wrote some lessons here: https://tomassetti.me/migrating-from-antlr2-to-antlr4/

Let me summarize the main points here.

Why to migrate?

  • ANTLR 4 has features that make grammars more concise and maintenable

  • ANTLR2 supports only a few target platforms: Java, C#, and C++ while ANTLR4 supports many more

ANTLR4 Features and Differences

  • ANTLR4 accepts left-recursive grammars: this a big one, as it leads to far simpler and "less deep" grammars

  • ANTLR4 parsers employ the adaptive LL(*) algorithm: no need for you to determine "k", which was never trivial to do

  • ANTLR4 no longer builds an abstract syntax tree (AST). This one will impact your migration the most

The process

  1. Rewrite the grammar, one rule at a time, removing all AST construction logic if present.
  2. Generate the parser and a visitor.
  3. If the consuming code needs an AST, write a visitor that builds an AST from a parse tree.
  4. If the ANTLR2 grammar included semantic actions, write a visitor that runs the actions, either from the parse tree or from the abstract syntax tree.

In the article we go into the details about translating the single options or the actions on tokens.

The core part is how to handle tree-rewriting rules, which are not present in ANTLR 4 anymore.

In practice you will need a library to define the AST, which you will obtain by simplifying the parse-tree produced by ANTLRv4. Now, in ANTLR v2 you used to do that in the grammar itself, while when using ANTLR v4 you will do that as a follow-up step. This is good, because you will have two simpler phases instead of one single convoluted grammar (good for maintenability and testability). However it would require you to write a little library to represent the AST.

In case you use the Java target you may be interested in using this open-source library to represent the AST: https://github.com/Strumenta/kolasu

like image 21
Federico Tomassetti Avatar answered Sep 28 '22 18:09

Federico Tomassetti