Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Constructing custom expression trees while using operators in C#

This question is about constructing custom expression trees in .NET using the operators found in C# (or any other language). I provide the question along with some the background information.


For my managed 2-phase 64-bit assembler I need support for expressions. For example, one might want to assemble:

mystring: DB 'hello, world'
          TIMES 64-$+mystring DB ' '

The expression 64-$+mystring must not be a string but an actual valid expression with the benefits of syntax and type checking and IntelliSense in VS, something along the lines of:

64 - Reference.CurrentOffset + new Reference("mystring");

This expression is not evaluated when it is constructed. Instead, it is evaluated later in my assembler's context (when it determines the symbol offsets and such). The .NET framework (since .NET 3.5) provides support for expressions trees, and it seems to me that it is ideal for this kind of expressions which are evaluated later or somewhere else.

But I don't know how to ensure that I can use the C# syntax (using +, <<, %, etc..) for constructing the expression tree. I want to prevent things like:

var expression = AssemblerExpression.Subtract(64,
    AssemblerExpression.Add(AssemblerExpression.CurrentOffset(),
        AssemblerExpression.Reference("mystring")))

How would you go about this?


Note: I need an expression tree to be able to convert the expression into an acceptable custom string representation, and at the same time be able to evaluate it at a point in time other than at its definition.


An explanation of my example: 64-$+mystring. The $ is the current offset, so it is a specific number that is unknown in advance (but known at evaluation time). The mystring is a symbol which may or may not be known at evaluation time (for example when it has not yet been defined). Subtracting a constant C from a symbol S is the same as S + -C. Subtracting two symbols S0 and S1 (S1 - S0) gives the integer difference between the two symbol's values.

However, this question is not really about how to evaluate assembler expressions, but more about how to evaluate any expression that has custom classes in them (for things like the symbols and $ in the example) and how to still ensure that it can be pretty-printed using some visitor (thus keeping the tree). And since the .NET framework has its expression trees and visitors, it would be nice to use those, if possible.

like image 657
Daniel A.A. Pelsmaeker Avatar asked Aug 23 '11 14:08

Daniel A.A. Pelsmaeker


People also ask

What type allows the C# compiler to build an expression tree from code?

The C# compiler can generate expression trees only from expression lambdas (or single-line lambdas). It cannot parse statement lambdas (or multi-line lambdas). For more information about lambda expressions in C#, see Lambda Expressions.

What is an expression tree give an example?

Each node in an expression tree is an expression. For example, an expression tree can be used to represent mathematical formula x < y where x, < and y will be represented as an expression and arranged in the tree like structure. Expression tree is an in-memory representation of a lambda expression.

How do you use expression trees?

Expression Trees provide richer interaction with the arguments that are functions. You write function arguments, typically using Lambda Expressions, when you create LINQ queries. In a typical LINQ query, those function arguments are transformed into a delegate the compiler creates.

What is expression trees and how they used in LINQ?

Expression Trees was first introduced in C# 3.0 (Visual Studio 2008), where they were mainly used by LINQ providers. Expression trees represent code in a tree-like format, where each node is an expression (for example, a method call or a binary operation such as x < y).


2 Answers

I don't know what exactly you are aiming for, but the following is some sketchy approach that I think would work.

Note I

  1. demonstrate only indexed reference expressions (thus ignoring indirect addressing via registers for now; you could add a RegisterInderectReference analogous to the SymbolicReference class). This also goes for you suggested $ (current offset) feature. It would probably be sure a register (?)
  2. doesn't explicitely show the unary/binary operator- at work either. However, the mechanics are largely the same. I stopped short of adding it because I couldn't work out the semantics of the sample expressions in your question
    (I'd think that subtracting the address of a known string is not useful, for example)
  3. the approach does not place (semantic) limits: you can offset any ReferenceBase derived IReference. In practice, you might only want to allow one level of indexing, and defining the operator+ directly on SymbolicReference would be more appropriate.
  4. Has sacrificed coding style for demo purposes (in general, you'll not want to repeatedly Compile() your expression trees, and direct evaluation with .Compile()() looks ugly and confusing. It's left up to the OP to integrate it in a more legible fashion

  5. The demonstration of the explicit conversion operator is really off-topic. I got carried away slighlty (?)

  6. You can observe the code running live on IdeOne.com

.

using System;
using System.Collections.Generic;
using System.Linq.Expressions;
using System.Linq;


namespace Assembler
{
    internal class State
    {
        public readonly IDictionary<string, ulong> SymbolTable = new Dictionary<string, ulong>();

        public void Clear() 
        {
            SymbolTable.Clear();
        }
    }

    internal interface IReference
    {
        ulong EvalAddress(State s); // evaluate reference to address
    }

    internal abstract class ReferenceBase : IReference
    {
        public static IndexedReference operator+(long directOffset, ReferenceBase baseRef) { return new IndexedReference(baseRef, directOffset); }
        public static IndexedReference operator+(ReferenceBase baseRef, long directOffset) { return new IndexedReference(baseRef, directOffset); }

        public abstract ulong EvalAddress(State s);
    }

    internal class SymbolicReference : ReferenceBase
    {
        public static explicit operator SymbolicReference(string symbol)    { return new SymbolicReference(symbol); }
        public SymbolicReference(string symbol) { _symbol = symbol; }

        private readonly string _symbol;

        public override ulong EvalAddress(State s) 
        {
            return s.SymbolTable[_symbol];
        }

        public override string ToString() { return string.Format("Sym({0})", _symbol); }
    }

    internal class IndexedReference : ReferenceBase
    {
        public IndexedReference(IReference baseRef, long directOffset) 
        {
            _baseRef = baseRef;
            _directOffset = directOffset;
        }

        private readonly IReference _baseRef;
        private readonly long _directOffset;

        public override ulong EvalAddress(State s) 
        {
            return (_directOffset<0)
                ? _baseRef.EvalAddress(s) - (ulong) Math.Abs(_directOffset)
                : _baseRef.EvalAddress(s) + (ulong) Math.Abs(_directOffset);
        }

        public override string ToString() { return string.Format("{0} + {1}", _directOffset, _baseRef); }
    }
}

namespace Program
{
    using Assembler;

    public static class Program
    {
        public static void Main(string[] args)
        {
            var myBaseRef1 = new SymbolicReference("mystring1");

            Expression<Func<IReference>> anyRefExpr = () => 64 + myBaseRef1;
            Console.WriteLine(anyRefExpr);

            var myBaseRef2 = (SymbolicReference) "mystring2"; // uses explicit conversion operator

            Expression<Func<IndexedReference>> indexedRefExpr = () => 64 + myBaseRef2;
            Console.WriteLine(indexedRefExpr);

            Console.WriteLine(Console.Out.NewLine + "=== show compiletime types of returned values:");
            Console.WriteLine("myBaseRef1     -> {0}", myBaseRef1);
            Console.WriteLine("myBaseRef2     -> {0}", myBaseRef2);
            Console.WriteLine("anyRefExpr     -> {0}", anyRefExpr.Compile().Method.ReturnType);
            Console.WriteLine("indexedRefExpr -> {0}", indexedRefExpr.Compile().Method.ReturnType);

            Console.WriteLine(Console.Out.NewLine + "=== show runtime types of returned values:");
            Console.WriteLine("myBaseRef1     -> {0}", myBaseRef1);
            Console.WriteLine("myBaseRef2     -> {0}", myBaseRef2);
            Console.WriteLine("anyRefExpr     -> {0}", anyRefExpr.Compile()());     // compile() returns Func<...>
            Console.WriteLine("indexedRefExpr -> {0}", indexedRefExpr.Compile()());

            Console.WriteLine(Console.Out.NewLine + "=== observe how you could add an evaluation model using some kind of symbol table:");
            var compilerState = new State();
            compilerState.SymbolTable.Add("mystring1", 0xdeadbeef); // raw addresses
            compilerState.SymbolTable.Add("mystring2", 0xfeedface);

            Console.WriteLine("myBaseRef1 evaluates to     0x{0:x8}", myBaseRef1.EvalAddress(compilerState));
            Console.WriteLine("myBaseRef2 evaluates to     0x{0:x8}", myBaseRef2.EvalAddress(compilerState));
            Console.WriteLine("anyRefExpr displays as      {0:x8}",   anyRefExpr.Compile()());
            Console.WriteLine("indexedRefExpr displays as  {0:x8}",   indexedRefExpr.Compile()());
            Console.WriteLine("anyRefExpr evaluates to     0x{0:x8}", anyRefExpr.Compile()().EvalAddress(compilerState));
            Console.WriteLine("indexedRefExpr evaluates to 0x{0:x8}", indexedRefExpr.Compile()().EvalAddress(compilerState));
        }
    }
}
like image 83
sehe Avatar answered Sep 25 '22 01:09

sehe


C# supports assigning a lambda expression to an Expression<TDelegate>, which will cause the compiler to emit code to create an expression tree representing the lambda expression, which you can then manipulate. E.g.:

Expression<Func<int, int, int>> times = (a, b) => a * b;

You could then potentially take the generated expression tree and convert it into your assembler's syntax tree, but this doesn't seem to be quite what you're looking for, and I don't think you're going to be able to leverage the C# compiler to do this for arbitrary input.

You're probably going to end up having to build your own parser for your assembly language, as I don't think the C# compiler is going to do what you want in this case.

like image 39
Iridium Avatar answered Sep 26 '22 01:09

Iridium