Caching Compiled Expression tree

Tags:

expression-trees

How to efficiently cache methods compiled from an expression tree ?

public void SomeToStringCalls() {     ToString(i => (i + 1).ToString(), 1);     ToString(i => (i + 1).ToString(), 2);     ToString(i => (i + 2).ToString(), 3);     ToString(i => (i + 2).ToString(), 4); }  private string ToString<T>(Expression<Func<T, string>> expression, T input) {     var method = expression.Compile();     return method.Invoke(input); }

Above, every call is going to recompile each expression, even if some are identical. I can't have a Dictionary<Expression<Func<T, string>>, Func<T, string>>() caching the compiled method from the expression because the equals will fails.

295

asked Oct 30 '13 09:10

Toto

1 Answers

The problems with caching expression trees in a centralized cache are:

You will need comprehensive equality comparison and hashing algorithms.
You will need to implement these algorithms yourself, as the standard expression types do not provide them out of the box.

A comprehensive equality comparison will be expensive to perform, but the cost can be alleviated somewhat with a cheap hash function. Also, since expression trees are immutable, you can cache the hash code after you've calculated it for the first time. That may shave some time off lookups, but each time you check the cache using a newly created expression as the key (which, I would imagine, would be most of the time) you will at least need to hash the new expression.

Option 1: Local/Static Caching

An ideal solution would avoid all of this overhead. If it's feasible (i.e., if these expressions are not composed dynamically), your best bet is to simply cache the expression trees near their declaration sites. You should be able to store most (if not all) of these in static fields:

private static readonly Expression<Func<int, string>> _addOne =     i => (i + 1).ToString(); private static readonly Expression<Func<int, string>> _addTwo =     i => (i + 2).ToString();  public void SomeToStringCalls() {     ToString(_addOne, 1);     ToString(_addOne, 2);     ToString(_addTwo, 3);     ToString(_addTwo, 4); }

The downside is that you may end up with duplicate expressions across various types, but if you are not generating a very large number of expressions, this may not be an issue.

Option 2: Centralized Caching

If that's not an option for you, you'll have to implement a centralized cache and the hashing and equality algorithms required to do so. The hashing algorithm will help you narrow down your candidate set, so it's important that it work reasonably well, i.e., produce very few collisions in practice. However, given the complex nature of expression trees, you want to keep the costs down. Therefore, I would advise you not aim for a perfect hashing function, but one that is reasonably cheap, yet effective. Perhaps something like this:

internal static class ExpressionHasher {     private const int NullHashCode = 0x61E04917;      [ThreadStatic]     private static HashVisitor _visitor;      private static HashVisitor Visitor     {         get         {             if (_visitor == null)                 _visitor = new HashVisitor();             return _visitor;         }     }      public static int GetHashCode(Expression e)     {         if (e == null)             return NullHashCode;          var visitor = Visitor;          visitor.Reset();         visitor.Visit(e);          return visitor.Hash;     }      private sealed class HashVisitor : ExpressionVisitor     {         private int _hash;          internal int Hash         {             get { return _hash; }         }          internal void Reset()         {             _hash = 0;         }          private void UpdateHash(int value)         {             _hash = (_hash * 397) ^ value;         }          private void UpdateHash(object component)         {             int componentHash;              if (component == null)             {                 componentHash = NullHashCode;             }             else             {                 var member = component as MemberInfo;                 if (member != null)                 {                     componentHash = member.Name.GetHashCode();                      var declaringType = member.DeclaringType;                     if (declaringType != null && declaringType.AssemblyQualifiedName != null)                         componentHash = (componentHash * 397) ^ declaringType.AssemblyQualifiedName.GetHashCode();                 }                 else                 {                     componentHash = component.GetHashCode();                 }             }              _hash = (_hash * 397) ^ componentHash;         }          public override Expression Visit(Expression node)         {             UpdateHash((int)node.NodeType);             return base.Visit(node);         }          protected override Expression VisitConstant(ConstantExpression node)         {             UpdateHash(node.Value);             return base.VisitConstant(node);         }          protected override Expression VisitMember(MemberExpression node)         {             UpdateHash(node.Member);             return base.VisitMember(node);         }          protected override MemberAssignment VisitMemberAssignment(MemberAssignment node)         {             UpdateHash(node.Member);             return base.VisitMemberAssignment(node);         }          protected override MemberBinding VisitMemberBinding(MemberBinding node)         {             UpdateHash((int)node.BindingType);             UpdateHash(node.Member);             return base.VisitMemberBinding(node);         }          protected override MemberListBinding VisitMemberListBinding(MemberListBinding node)         {             UpdateHash((int)node.BindingType);             UpdateHash(node.Member);             return base.VisitMemberListBinding(node);         }          protected override MemberMemberBinding VisitMemberMemberBinding(MemberMemberBinding node)         {             UpdateHash((int)node.BindingType);             UpdateHash(node.Member);             return base.VisitMemberMemberBinding(node);         }          protected override Expression VisitMethodCall(MethodCallExpression node)         {             UpdateHash(node.Method);             return base.VisitMethodCall(node);         }          protected override Expression VisitNew(NewExpression node)         {             UpdateHash(node.Constructor);             return base.VisitNew(node);         }          protected override Expression VisitNewArray(NewArrayExpression node)         {             UpdateHash(node.Type);             return base.VisitNewArray(node);         }          protected override Expression VisitParameter(ParameterExpression node)         {             UpdateHash(node.Type);             return base.VisitParameter(node);         }          protected override Expression VisitTypeBinary(TypeBinaryExpression node)         {             UpdateHash(node.Type);             return base.VisitTypeBinary(node);         }     } }

It's not perfect, but it should get you pretty good results:

It drills down and includes every expression in the tree.
At minimum, the NodeType of every sub-expression is included in the hash. One obvious (but potentially costly) improvement would be to also include the Type; try it if you find you are getting too many collisions.
Members and types referenced in the expression are included.
Constant values appearing in the tree are included.
It doesn't require a heap allocation to run, at the expense of being non-reentrant (since you're only analyzing a top-level expression, this is fine). You can run it concurrently on multiple threads.

Since you're not actually overriding GetHashCode() for any of the expression types, the imperfections of the hashing function will not leak out and affect external code. This gives us a degree of freedom in bending the rules of hash functions.

Your equality comparison will need to be more comprehensive. Whereas the hashing function is effectively an 'estimate' used to minimize your candidate set, the equality comparison performs the actual matching, and it needs to be perfect. You could certainly use my proposed hashing solution as a template for how to approach the problem, but bear in mind you must perform an exhaustive comparison of all an expression's attributes. One thing to bear in mind is that you probably don't want to compare the names of the ParameterExpression nodes in the tree, but you'll want to maintain a mapping of the parameters/variables in the two trees you are comparing to make sure they represent the "same" value in the context of their parent expression tree.

With the exception of ignoring parameter/variable names, do not bother attempting to resolve "semantic equivalence", i.e., expressions which produce the same results and side effects but are not structurally identical. It cannot be done efficiently, and it's not worth it to try.

Lastly, you might be able to speed things up by implementing a two-level lookup: first, choose the correct cache based on the parameter type(s), then look for a match within that cache. This approach would be most effective if it is guaranteed that each lambda expression will have exactly one argument. The benefits would degrade with more arguments.

answered Sep 19 '22 00:09

Mike Strobel

Related questions
                            
                                Persistent Work Queue in C#
                            
                                Who should own the private key used to sign a .NET assembly when its project is open-source? [closed]
                            
                                SQLDependency_OnChange-Event fires only one single Time
                            
                                Get OS-Version in WinRT Metro App C#
                            
                                C#: Any() vs Count() for an empty list
                            
                                How to enable configSource attribute for Custom Configuration Section in .NET?
                            
                                Changing a Visual Studio C# project from x86 to Any CPU
                            
                                Creating a file shortcut (.lnk)
                            
                                How to disable all whitespace autoformatting in Visual Studio 2015?
                            
                                Nuget Package ... does not exist in project ... Package ... Already exists in folder
                            
                                Where to store application settings/state in a MVVM application
                            
                                How to allow multiple authentication methods in ASP.NET?
                            
                                RESTful web service auto-generate WADL
                            
                                IIS / SQL Server connection error
                            
                                In MsBuild, what's the difference between PropertyGroup and ItemGroup
                            
                                MSBuild Unhandled Exception: The FileName property should not be a directory unless UseShellExecute is set
                            
                                Asynchronous multi-direction server-client communication over the same open socket?
                            
                                Where can I find C# 3.0 grammar?
                            
                                Why does IsAssignableFrom() not work for int and double?
                            
                                Using higher-order Haskell types in C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With