Parsing C# code (as string) and inserting additional methods

Question

I have a C# app I'm working on that loads it's code remotely, and then runs it (for the sake of argument, you can assume the app is secure).

The code is C#, but it is sent as an XML document, parse out as a string, and then compiled and executed.

Now, what I'd like to do - and am having a bit more difficulty than I expected - is be able to parse the entire document, and before compiling, insert additional commands after every line execution.

For example, consider the code:

using System;
using System.Collections.Generic;
using System.Linq;

namespace MyCode
{
    static class MyProg
    {
        static void Run()
        {
            int i = 0;
            i++;

            Log(i);
        }
    }
}

What I'd like, after parsing is something more like:

using System;
using System.Collections.Generic;
using System.Linq;

namespace MyCode
{
    static class MyProg
    {
        static void Run()
        {
            int i = 0;
            MyAdditionalMethod();
            i++;
            MyAdditionalMethod();

            Log(i);
            MyAdditionalMethod();
        }
    }
}

Keep in mind the obvious pitfalls - I can't just have it after every semi-colon, because this would not work in a getter/setter, i.e.:

Converting:

public string MyString { get; set; }

To:

public string MyString { get; MyAdditionalMethod(); set; MyAdditionalMethod(); }

would fail. As would class-level declarations, using statements, etc. Also, there are a number of cases where I could also add in MyAdditionalMethod() after curly braces - like in delegates, immediately after if statements, or method declarations, etc.

So, what I've been looking into CodeDOM, and this looks like it could be a solution but it's tough to figure out where to start. I'm otherwise trying to parse the entire thing and create a tree which I can parse through - though that's a little tough, considering the number of cases I need to consider.

Does anyone know any other solutions that are out there?

Kris · Accepted Answer

There are a few C# parsers out there I'd recommend using something from Mono or SharpDevelop as they should be up to date. I had a go using NRefactory from SharpDevelop, if you download the source for SharpDevelop there is a demo and some UnitTests that are a good intro to its usage.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ICSharpCode.NRefactory;
using System.IO;
using ICSharpCode.NRefactory.Ast;
using ICSharpCode.NRefactory.Visitors;
using ICSharpCode.NRefactory.PrettyPrinter;

namespace Parse
{
    class Program
    {
        static void Main(string[] args)
        {
            string code = @"using System;
            using System.Collections.Generic;
            using System.Linq;

            namespace MyCode
            {
                static class MyProg
                {
                    static void Run()
                    {
                        int i = 0;
                        i++;

                        Log(i);
                    }
                }
            }
            ";

            IParser p = ParserFactory.CreateParser(SupportedLanguage.CSharp, new StringReader(code));
            p.Parse();

            //Output Original
            CSharpOutputVisitor output = new CSharpOutputVisitor();
            output.VisitCompilationUnit(p.CompilationUnit, null);
            Console.Write(output.Text);

            //Add custom method calls
            AddMethodVisitor v = new AddMethodVisitor();
            v.VisitCompilationUnit(p.CompilationUnit, null);
            v.AddMethodCalls();
            output = new CSharpOutputVisitor();
            output.VisitCompilationUnit(p.CompilationUnit, null);

            //Output result
            Console.Write(output.Text);
            Console.ReadLine();
        }


    }

    //The vistor adds method calls after visiting by storing the nodes in a dictionary. 
    public class AddMethodVisitor : ConvertVisitorBase
    {
        private IdentifierExpression member = new IdentifierExpression("MyAdditionalMethod");

        private Dictionary<INode, INode> expressions = new Dictionary<INode, INode>();

        private void AddNode(INode original)
        {
            expressions.Add(original, new ExpressionStatement(new InvocationExpression(member)));
        }

        public override object VisitExpressionStatement(ExpressionStatement expressionStatement, object data)
        {
            AddNode(expressionStatement);
            return base.VisitExpressionStatement(expressionStatement, data);
        }

        public override object VisitLocalVariableDeclaration(LocalVariableDeclaration localVariableDeclaration, object data)
        {
            AddNode(localVariableDeclaration);
            return base.VisitLocalVariableDeclaration(localVariableDeclaration, data);
        }

        public void AddMethodCalls()
        {
            foreach (var e in expressions)
            {
                InsertAfterSibling(e.Key, e.Value);
            }
        }

    }
}

You will need to improve the visitor to handle more cases but it's a good start.

Alternatively you could compile the original and do some IL manipulation using Cecil or try some AOP library like PostSharp. Finally you could look into the .NET Profiling API.

Ira Baxter · Answer

You could use a source-to-source program transformation system. Such a tool parses the code, builds and ASTs, lets you apply transformations, and then regenerates text from the AST. What makes a source-to-source system nice, it that you can write transformations in terms of the source language syntax rather than the fractal detail of the AST, which makes them far easier to write and understand later.

What you want to do would be modelled by a pretty simple program transformation using our DMS Software Reengineering Toolkit:

rule insert_post_statement_call(s: stmt): stmt -> stmt =
   " \s " -> " { \s ; MyAdditionalMethod();   }";

This rule isn't a "text" substitution; rather, it is parsed by the parser that processes the target code, and so in fact it represents two ASTs, a left- and right- hand side (separated by the "->" syntax. The quotes aren't string quotes; they are quotes around the target language syntax to differentiate it from the syntax of the rule language itself. What is inside the quotes is target language (e.g., C#) text with escapes like \s, which represent entire language elements (in this case, a stmt according the the target language (e.g. C#) grammar. The left hand side says, "match any statement s" because s is defined to be a "stmt" in the grammar. The right hand side says, "replace the statement with a block containing the original statement \s, and the new code you want inserted". This is all done in terms of syntax trees using the grammar as a guide; it can't apply the transform to anything that isn't a statement. [The reason for rewriting the statement as a block, is because that way the right side is valid where statements are valid, go check your grammar.]

As a practical matter, you'll need to write rules to handle other special cases but this is mostly writing more rules. You also need to package the parser/transformer/prettyprinter as bundle which requires some procedural glue. This is still far easier than trying to write code to reliably climb up and down the tree, matching the nodes and then smashing those nodes to get what you want. Better, when your grammar (invariably) has to be adjusted, the rewrite rules are reparsed according to the revised grammar and still work; whatever procedural tree climbing you might be doing is almost certainly gauranteed to break.

As you write more and more transformations, this capability becomes more and more valuable. And when you are successful with a small number of transformations, adding more becomes attractive quickly.

See this technical paper for a more thorough discussion of how DMS works, and how it is used to apply instrumentation transformations, like you want to do, in real tools. This paper describes the basic ideas behind the test coverage tools sold by Semantic Designs.

Parsing C# code (as string) and inserting additional methods

Tags:

c#

parsing

reflection

code-generation

codedom

AlishahNovin

2 Answers

Kris

Ira Baxter

Recent Activity

Donate For Us

Parsing C# code (as string) and inserting additional methods

Tags:

c#

parsing

reflection

code-generation

codedom

AlishahNovin

2 Answers

Kris

Ira Baxter

Related questions

Recent Activity

Donate For Us