How to modify C++ code from user-input

Tags:

I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.

I want to replace anything of the form

A->Draw(B1, B2)

with

MyFunc(A, B1, B2).

My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression

My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.

The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.

Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?

532

asked Aug 04 '15 16:08

Eldritch Cheese

1 Answers

What you want is a Program Transformation System. These are tools that generally let you express changes to source code, written in source level patterns that essentially say:

 if you see *this*, replace it by *that*

but operating on Abstract Syntax Trees so the matching and replacement process is far more trustworthy than what you get with string hacking.

Such tools have to have parsers for the source language of interest. The source language being C++ makes this fairly difficult.

Clang sort of qualifies; after all it can parse C++. OP objects it cannot do so without all the environment context. To the extent that OP is typing (well-formed) program fragments (statements, etc,.) into the interpreter, Clang may [I don't have much experience with it myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). Finally, Clang isn't really a PTS; its tree modification procedures are not source-to-source transforms. That matters for convenience but might not stop OP from using it; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. When there are more than a few rules, this starts to matter a lot.

GCC with Melt sort of qualifies in the same way that Clang does. I'm under the impression that Melt makes GCC at best a bit less intolerable for this kind of work. YMMV.

Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. DMS has been used to carry out massive transformations on large scale C++ code bases.

DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. [You may end up with multiple parses, e.g. ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments.

DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text to feed to the interpreter.

Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. For his example, he can provide DMS with a rewrite rule (untested but pretty close to right):

rule replace_Draw(A:primary,B1:expression,B2:expression):
        primary->primary
    "\A->Draw(\B1, \B2)"     -- pattern
rewrites to
    "MyFunc(\A, \B1, \B2)";  -- replacement

and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; the backslash is a metaescape used inside metaquotes to name metavariables. For more details of what you can say in the rule syntax, see DMS Rewrite Rules.

If OP provides a set of such rules, DMS can be asked to apply the entire set.

So I think this would work just fine for OP. It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; DMS and its C++ front end are hardly "small" programs. But then modern machines have lots of resources so I think its a question of how badly does OP need to do this.

answered Oct 06 '22 20:10

Ira Baxter

Related questions
                            
                                power of an integer in c++ [duplicate]
                            
                                I need high performance. Will there be a difference if I use C or C++?
                            
                                Define std::string in C++ without escape characters
                            
                                What languages have higher levels of abstraction and require less manual memory management than C++?
                            
                                Reason why not to have a DELETE macro for C++
                            
                                using namespace in function implementation [closed]
                            
                                Why should one never use auto&& for local variables?
                            
                                Consistent approach for renaming namespaces in C++
                            
                                Kinect SDK: align depth and color frames
                            
                                g++ and clang++ different behaviour with stream input and unsigned integer
                            
                                Template alias and specialization
                            
                                MSVC++: template's static_assert is not triggered inside a lambda
                            
                                Is it safe to assert(sizeof(A) == sizeof(B)) when A and B are "the same"?
                            
                                How can I make QScintilla auto-indent like SublimeText?
                            
                                WINSOCK - Setting a timeout for a connection attempt on a non existing IP?
                            
                                Making Doxygen read double-slash C++ comments as markup
                            
                                Initializing mutually-referencing objects
                            
                                C++11: Is it safe to remove individual elements from std::unordered_map while iterating?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to modify C++ code from user-input

Tags:

c++

clang

root-framework

Eldritch Cheese

People also ask

1 Answers

Ira Baxter

Recent Activity

Donate For Us