Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the ANTLR parser generator best for a C++ app with constrained memory?

Tags:

c++

c

parsing

antlr

I'm looking for a good parser generator that I can use to read a custom text-file format in our large commercial app. Currently this particular file format is read with a handmade recursive parser but the format has grown and complexified to the point where that approach has become unmanageable.

It seems like the ultimate solution would be to build a proper grammar for this format and then use a real parser generator like yacc to read it, but I'm having trouble deciding which such generator to use or even if they're worth the trouble at all. I've looked at ANTLR and Spirit, but our project has specific constraints beyond earlier answers that make me wonder if they're as appropriate for us. In particular, I need:

  • A parser that generates C or C++ code with MSVC. ANTLR 3 doesn't support C++; it claims to generate straight C but the docs on getting it to actually work are sort of confusing.
  • Severely constrained memory usage. Memory is at a huge premium in our app and even tiny leaks are fatal. I need to be able to override the parser's memory allocator to use our custom malloc(), or at the very least I need to give it a contiguous pool from which it draws all its memory (and which I can deallocate en bloc afterwards). I can spare about 200kb for the parser executable itself, but whatever dynamic heap it allocates in parsing has to get freed afterwards.
  • Good performance. This is less critical but we ought to be able to parse 100kb of text in no more than a second on a 3ghz processor.
  • Must be GPL-free. We can't use GNU code.

I like ANTLRworks' IDE and debugging tools, but it looks like getting its C target to actually work with our app will be a huge undertaking. Before I embark on that palaver, is ANTLR the right tool for this job?

The text format in question looks something like:

attribute "FluxCapacitance"  real constant

asset DeLorean
{
    //comment foo bar baz
    model "delorean.mdl"
    animation "gullwing.anm"
    references "Marty"
    loadonce
}

template TimeMachine
{
    attribute FluxCapacitance 10      
    asset DeLorean
}
like image 594
Crashworks Avatar asked Apr 21 '09 00:04

Crashworks


People also ask

What is ANTLR used for?

What is ANTLR? ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. Terence Parr is a tech lead at Google and until 2022 was a professor of data science / computer science at Univ.

What is the best parser generator?

Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.

Should you use a parser generator?

A parser generator is a good tool that you should make part of your toolbox. A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar.

Is ANTLR used in industry?

ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It's widely used in academia and industry to build all sorts of languages, tools, and frameworks.


1 Answers

ANTLR 3 doesn't support C++; it claims to generate straight C but the docs on getting it to actually work are sort of confusing.

It does generate C, and furthermore, it works with Visual Studio and C++. I know this because I've done it before and submitted a patch to get it to work with stdcall.

Memory is at a huge premium in our app and even tiny leaks are fatal. I need to be able to override the parser's memory allocator to use our custom malloc(), or at the very least I need to give it a contiguous pool from which it draws all its memory (and which I can deallocate en bloc afterwards). I can spare about 200kb for the parser executable itself, but whatever dynamic heap it allocates in parsing has to get freed afterwards.

The antlr3c runtime, last time I checked does not have a memory leak, and uses the Memory pool paradigm which you describe. However, it does have one shortcoming in the API which the author refuses to change, which is that if you request the string of a node, it will create a new copy each time until you free the entire parser.

I have no comment on the ease of using a custom malloc, but it does have a macro to define what malloc function to use in the entire project.

As for the executable size, my compilation was about 100 kb in size including a small interpreter.

My suggestion to you is to keep learning ANTLR, because it still fits your requirements and you probably need to sacrifice a little more time before it will start working for you.

like image 119
Unknown Avatar answered Sep 22 '22 03:09

Unknown