Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elegant way for C++ code generation

I am currently working on a database related project in which I generate a lot of C++ code. This code is compiled then and loaded as a dynamic library. I use this techniques to build efficient code for the database schema and queries.

Currently, I am using simple file write to generate the code (what was okay for the proof-of-concept implementation). Now, I am searching for a more elegant but comparable flexible solution to generate C++ code.

I searched quite a lot but all the solutions I found so far are rather complex/extensive, not efficient enough, or not flexible enough.

What libraries are you using in your C++ projects to generate code?

Best, Moritz

like image 694
moo Avatar asked Nov 10 '22 20:11

moo


1 Answers

You can use a program transformation system (PTS) to define and compose code templates in a reliable way.

Most PTS enable one to define a grammar, and then parse source code into ASTs using that grammar. More importantly, they accept patterns: source code fragments (usually of a nonterminal or a list of nonterminals) with placeholders that correspond to wellformed sub-fragments (nonterminals representing subtrees). These patterns usually insist that a named placeholder match identically (see example below). Such patterns can be used to match against a parsed AST as a way to find code fragments using the surface syntax.

So, one might use a pattern:

   pattern x_squared(t: term): product
      = " \t * \t ";

to hunt for subexpressions which consist of products of identical subtrees. This will match

   (p + q[17])*(p+q[17)

but not

    2 * (x-3)

But just as interestingly, such patterns can be used as code generators, by instantiating the pattern with bound value (trees) for the variables. So, "instantiate x_squared(2^x)" produces

     (2^x)*(2^x)

By itself, this is just a fancy sort of macro scheme. It is a lot better, in that it can tell you "at compile time" (for the patterns) whether what you are composing makes sense or not. So you get type checking of the composition of the code fragments. For instance, you might accidentally code "instantiate x_squared(int q)", but a good PTS will object that "int q" is not a "term"; you find the bug when you build the code generator.

Where this gets really interesting is where one can build many different code fragments, from many different patterns, and compose those fragments with yet more patterns. This allows one to build very complex code. All of this is a (syntax-type) safe way; resulting trees are valid syntax. (You can still bollix semantics; nothing is perfect). As the complexity of the code you can generate goes up, it is good to have this additional checking to help you avoid generating bad code.

A PTS has an additional advantage: after composing code fragments, it can apply source-to-source transformations to optimize the resulting code. Thus you can produce optimized code according to your ability to write matching transformations, and harnessing knowledge you have during code generation. Imagine you generate code for a matrix multiply:

 ... P * Q ...

and your code generator somehow or other knows that Q is an identity matrix. Then the following optimization can remove an expensive matrix multiply:

  rule optimize_matrix_times_unit(m: term, n: term): product -> product
       " \m * \q "
   ->  " \m "
    if is_identity_matrix(q)

This transformation takes advantage of pattern matching (to find a matrix product) in the generated code, pattern instantiation (to generate a replacement for the matched product), and additional knowledge or analysis (is_identity_matrix) that the code generation can do.

You need a PTS capable of handling C++ parsing; those are a bit hard to find. The one I designed (DMS Software Reengineering Toolkit) happens to do this. The examples in this answer are DMS-style.

Here's a technical paper that describes a large-scale reengineering task done by DMS on C++ code. A number of examples in the paper are actually quite complex patterns used to instantiate code; the reengineering task had to generate a new set of APIs for an existing chunk of code.

like image 121
Ira Baxter Avatar answered Nov 15 '22 06:11

Ira Baxter