Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Coco/R vs. ANTLR

I'm evaluating using Coco/R vs. ANTLR for use in a C# project as part of what's essentially a scriptable mail-merge functionality. To parse the (simple) scripts, I'll need a parser.

I've focussed on Coco/R and ANTLR because both seem fairly mature and well-maintained and capable of generating decent C# parsers.

Neither seem to be trivial to use either, however, and simplicity is something I'd appreciate - particularly maintainability by others.

Does anyone have any recommendations to make? What are the pros/cons of either for a parsing a small language - or am I looking into the wrong things entirely? How well do these integrate into a typical continuous integration setup? What are the pitfalls?

Related: Well, many questions, such as 1, 2, 3, 4, 5.

like image 592
Eamon Nerbonne Avatar asked Apr 27 '10 15:04

Eamon Nerbonne


2 Answers

We have used Coco for 2 years, having replaced Antler we were formerly using. For a typical big-data query (our application), our experience has been this. Caveat: We are dependent upon full Utf-8 handling, with the parser implemented in C++. These numbers are for a language that has some 200 EBNF productions.

  • Antler: 260 usecs/query and a 108 MEGABYTE memory footprint for the generated parser/lexer
  • Coco: 220 usecs/query and a 70 KBYTE memory footprint for the parser/scanner

Initially, Coco had a 1.2 msecs startup time and generated several 60 KBYTE tables for mapping Utf-8. We have made many local enhancements to Coco, such as to eliminate the big tables, eliminated the 1.2 msec startup time, hugely enhanced internal documentation (as well as documentation in the generated code).

Our version of (open source) Coco has a tiny footprint compared to Antlr and is very measurably faster, has no startup delay and just... works. It does not have Antler's nice UI but that never entered our mind to be an issue once we started using Coco.

like image 96
Thomas Visel Avatar answered Sep 21 '22 18:09

Thomas Visel


ANTLR is LL(*), which is as powerful as PEG, though usually much more efficient and flexible. LL(*) degenerates to LL(k) for k>1 one arbitrary lookahead is not necessary.

like image 23
Terence Parr Avatar answered Sep 24 '22 18:09

Terence Parr