I'm looking for a good parser generator that I can use to read a custom text-file format in our large commercial app. Currently this particular file format is read with a handmade recursive parser but the format has grown and complexified to the point where that approach has become unmanageable.
It seems like the ultimate solution would be to build a proper grammar for this format and then use a real parser generator like yacc to read it, but I'm having trouble deciding which such generator to use or even if they're worth the trouble at all. I've looked at ANTLR and Spirit, but our project has specific constraints beyond earlier answers that make me wonder if they're as appropriate for us. In particular, I need:
I like ANTLRworks' IDE and debugging tools, but it looks like getting its C target to actually work with our app will be a huge undertaking. Before I embark on that palaver, is ANTLR the right tool for this job?
The text format in question looks something like:
attribute "FluxCapacitance" real constant
asset DeLorean
{
//comment foo bar baz
model "delorean.mdl"
animation "gullwing.anm"
references "Marty"
loadonce
}
template TimeMachine
{
attribute FluxCapacitance 10
asset DeLorean
}
What is ANTLR? ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. Terence Parr is a tech lead at Google and until 2022 was a professor of data science / computer science at Univ.
Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.
A parser generator is a good tool that you should make part of your toolbox. A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar.
ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It's widely used in academia and industry to build all sorts of languages, tools, and frameworks.
ANTLR 3 doesn't support C++; it claims to generate straight C but the docs on getting it to actually work are sort of confusing.
It does generate C, and furthermore, it works with Visual Studio and C++. I know this because I've done it before and submitted a patch to get it to work with stdcall.
Memory is at a huge premium in our app and even tiny leaks are fatal. I need to be able to override the parser's memory allocator to use our custom malloc(), or at the very least I need to give it a contiguous pool from which it draws all its memory (and which I can deallocate en bloc afterwards). I can spare about 200kb for the parser executable itself, but whatever dynamic heap it allocates in parsing has to get freed afterwards.
The antlr3c runtime, last time I checked does not have a memory leak, and uses the Memory pool paradigm which you describe. However, it does have one shortcoming in the API which the author refuses to change, which is that if you request the string of a node, it will create a new copy each time until you free the entire parser.
I have no comment on the ease of using a custom malloc, but it does have a macro to define what malloc function to use in the entire project.
As for the executable size, my compilation was about 100 kb in size including a small interpreter.
My suggestion to you is to keep learning ANTLR, because it still fits your requirements and you probably need to sacrifice a little more time before it will start working for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With