Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Build parser from grammar at runtime

Many (most) regular expression libraries for C++ allow for creating the expression from a string during runtime. Is anyone aware of any C++ parser generators that allow for feeding a grammar (preferably BNF) represented as a string into a generator at runtime? All the implementations I've found either require an explicit code generator to be run or require the grammar to be expressed via clever template meta-programming.

like image 327
tgoodhart Avatar asked Sep 12 '11 18:09

tgoodhart


People also ask

What is the best parser generator?

Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.

What is a Lexer vs parser?

A lexer and a parser work in sequence: the lexer scans the input and produces the matching tokens, the parser then scans the tokens and produces the parsing result.


2 Answers

It should be pretty easy to build a recursive descent, backtracking parser that accepts a grammar as input. You can reduce all your rules to the following form (or act as if you have):

 A = B C D ;

Parsing such a rule by recursive descent is easy: call a routine that corresponds to finding a B, then one that finds a C, then one that finds a D. Given you are doing a general parser, you can always call a "parse_next_sentential_form(x)" function, and pass the name of the desired form (terminal or nonterminal token) as x (e.g., "B", "C", "D").

In processing such a rule, the parser wants to produce an A, by finding a B, then C, then D. To find B (or C or D), you'd like to have an indexed set of rules in which all the left-hand sides are the same, so one can easily enumerate the B-producing rules, and recurse to process their content. If your parser gets a failure, it simply backtracks.

This won't be a lightning fast parser, but shouldn't be terrible if well implemented.

One could also use an Earley parser, which parses by creating states of partially-processed rules.

If you wanted it to be fast, I suppose you could simply take the guts of Bison and make it into a library. Then if you have grammar text or grammar rules (different entry points into Bison), you could start it and have it produce its tables in memory (which it must do in some form). Don't spit them out; simply build an LR parsing engine that uses them. Voila, on-the-fly efficient parser generation. You have to worry about ambiguities and the LALR(1)ness of your grammar if you do this; the previous two solutions work with any context free grammar.

like image 194
Ira Baxter Avatar answered Sep 23 '22 13:09

Ira Baxter


I am not aware of an existing library for this. However if performance and robustness are not critical, then you can spin off bison or any other tool that generates C code (via popen(3) or similar), spin off gcc on the generated code, link it into shared library and load the library via dlopen(3)/dlsym(3). On Windows -- DLL and LoadLibrary() instead.

like image 24
Anton Belov Avatar answered Sep 21 '22 13:09

Anton Belov