Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extension method for IQueryable left outer join using LINQ

Tags:

I am trying to implement Left outer join extension method with return type IQueryable.

The function that I have written is as follows

public static IQueryable<TResult> LeftOuterJoin2<TOuter, TInner, TKey, TResult>(         this IQueryable<TOuter> outer,         IQueryable<TInner> inner,         Func<TOuter, TKey> outerKeySelector,         Func<TInner, TKey> innerKeySelector,         Func<TOuter, TInner, TResult> resultSelector) {         return           from outerItem in outer           join innerItem in inner on outerKeySelector(outerItem)              equals innerKeySelector(innerItem) into joinedData           from r in joinedData.DefaultIfEmpty()           select resultSelector(outerItem, r); } 

It can't generate the query. The reason might be: I have used Func<> instead of Expression<>. I tried with Expression<> as well. It gives me an error on outerKeySelector(outerItem) line, which is outerKeySelector is a variable which is being used as a method

I found some discussions on SO (such as here) and CodeProjects, but those work for IEnumerable types not for IQueryable.

like image 440
N Rocking Avatar asked Feb 06 '14 22:02

N Rocking


2 Answers

Intro

This question is very interesting. The problem is Funcs are delegates and Expressions are trees, they are completely different structures. When you use your current extension implementation it uses loops and executes your selectors on each step for each element and it works well. But when we talk about entity framework and LINQ we need tree traversal for translation it to SQL query. So it's a "little" harder than Funcs (but I like Expressions anyway) and there are some problems described below.

When you want to do left outer join you can use something like this (taken from here: How to implement left join in JOIN Extension method)

var leftJoin = p.Person.Where(n => n.FirstName.Contains("a"))                    .GroupJoin(p.PersonInfo,                                n => n.PersonId,                               m => m.PersonId,                               (n, ms) => new { n, ms = ms.DefaultIfEmpty() })                    .SelectMany(z => z.ms.Select(m => new { n = z.n, m )); 

It is good, but it is not extension method we need. I guess you need something like this:

using (var db = new Database1Entities("...")) {      var my = db.A.LeftOuterJoin2(db.B, a => a.Id, b => b.IdA,           (a, b) => new { a, b, hello = "Hello World!" });      // other actions ... } 

There are many hard parts in creating such extensions:

  • Creating complex trees manually, compiler will not help us here
  • Reflection is needed for methods like Where, Select, etc
  • Anonymous types (!! we need codegen here?? I hope no)

Steps

Consider 2 simple tables: A (columns: Id, Text) and B (Columns Id, IdA, Text).

Outer join could be implemented in 3 steps:

// group join as usual + use DefaultIfEmpty var q1 = Queryable.GroupJoin(db.A, db.B, a => a.Id, b => b.IdA,                                (a, b) => new { a, groupB = b.DefaultIfEmpty() });  // regroup data to associated list a -> b, it is usable already, but it's  // impossible to use resultSelector on this stage,  // beacuse of type difference (quite deep problem: some anonymous type != TOuter) var q2 = Queryable.SelectMany(q1, x => x.groupB, (a, b) => new { a.a, b });  // second regroup to get the right types var q3 = Queryable.SelectMany(db.A,                                 a => q2.Where(x => x.a == a).Select(x => x.b),                                 (a, b) => new {a, b}); 

Code

Ok, I'm not such a good teller, here is he code I have (Sorry I was unable to format it better, but it works!):

public static IQueryable<TResult> LeftOuterJoin2<TOuter, TInner, TKey, TResult>(         this IQueryable<TOuter> outer,         IQueryable<TInner> inner,         Expression<Func<TOuter, TKey>> outerKeySelector,         Expression<Func<TInner, TKey>> innerKeySelector,         Expression<Func<TOuter, TInner, TResult>> resultSelector)     {          // generic methods         var selectManies = typeof(Queryable).GetMethods()             .Where(x => x.Name == "SelectMany" && x.GetParameters().Length == 3)             .OrderBy(x=>x.ToString().Length)             .ToList();         var selectMany = selectManies.First();         var select = typeof(Queryable).GetMethods().First(x => x.Name == "Select" && x.GetParameters().Length == 2);         var where = typeof(Queryable).GetMethods().First(x => x.Name == "Where" && x.GetParameters().Length == 2);         var groupJoin = typeof(Queryable).GetMethods().First(x => x.Name == "GroupJoin" && x.GetParameters().Length == 5);         var defaultIfEmpty = typeof(Queryable).GetMethods().First(x => x.Name == "DefaultIfEmpty" && x.GetParameters().Length == 1);          // need anonymous type here or let's use Tuple         // prepares for:         // var q2 = Queryable.GroupJoin(db.A, db.B, a => a.Id, b => b.IdA, (a, b) => new { a, groupB = b.DefaultIfEmpty() });         var tuple = typeof(Tuple<,>).MakeGenericType(             typeof(TOuter),             typeof(IQueryable<>).MakeGenericType(                 typeof(TInner)                 )             );         var paramOuter = Expression.Parameter(typeof(TOuter));         var paramInner = Expression.Parameter(typeof(IEnumerable<TInner>));         var groupJoinExpression = Expression.Call(             null,             groupJoin.MakeGenericMethod(typeof (TOuter), typeof (TInner), typeof (TKey), tuple),             new Expression[]                 {                     Expression.Constant(outer),                     Expression.Constant(inner),                     outerKeySelector,                     innerKeySelector,                     Expression.Lambda(                         Expression.New(                             tuple.GetConstructor(tuple.GetGenericArguments()),                             new Expression[]                                 {                                     paramOuter,                                     Expression.Call(                                         null,                                         defaultIfEmpty.MakeGenericMethod(typeof (TInner)),                                         new Expression[]                                             {                                                 Expression.Convert(paramInner, typeof (IQueryable<TInner>))                                             }                                 )                                 },                             tuple.GetProperties()                             ),                         new[] {paramOuter, paramInner}                 )                 }             );          // prepares for:         // var q3 = Queryable.SelectMany(q2, x => x.groupB, (a, b) => new { a.a, b });         var tuple2 = typeof (Tuple<,>).MakeGenericType(typeof (TOuter), typeof (TInner));         var paramTuple2 = Expression.Parameter(tuple);         var paramInner2 = Expression.Parameter(typeof(TInner));         var paramGroup = Expression.Parameter(tuple);         var selectMany1Result = Expression.Call(             null,             selectMany.MakeGenericMethod(tuple, typeof (TInner), tuple2),             new Expression[]                 {                     groupJoinExpression,                     Expression.Lambda(                         Expression.Convert(Expression.MakeMemberAccess(paramGroup, tuple.GetProperty("Item2")),                                            typeof (IEnumerable<TInner>)),                         paramGroup                 ),                     Expression.Lambda(                         Expression.New(                             tuple2.GetConstructor(tuple2.GetGenericArguments()),                             new Expression[]                                 {                                     Expression.MakeMemberAccess(paramTuple2, paramTuple2.Type.GetProperty("Item1")),                                     paramInner2                                 },                             tuple2.GetProperties()                             ),                         new[]                             {                                 paramTuple2,                                 paramInner2                             }                 )                 }             );          // prepares for final step, combine all expressinos together and invoke:         // var q4 = Queryable.SelectMany(db.A, a => q3.Where(x => x.a == a).Select(x => x.b), (a, b) => new { a, b });         var paramTuple3 = Expression.Parameter(tuple2);         var paramTuple4 = Expression.Parameter(tuple2);         var paramOuter3 = Expression.Parameter(typeof (TOuter));         var selectManyResult2 = selectMany             .MakeGenericMethod(                 typeof(TOuter),                 typeof(TInner),                 typeof(TResult)             )             .Invoke(                 null,                 new object[]                     {                         outer,                         Expression.Lambda(                             Expression.Convert(                                 Expression.Call(                                     null,                                     select.MakeGenericMethod(tuple2, typeof(TInner)),                                     new Expression[]                                         {                                             Expression.Call(                                                 null,                                                 where.MakeGenericMethod(tuple2),                                                 new Expression[]                                                     {                                                         selectMany1Result,                                                         Expression.Lambda(                                                              Expression.Equal(                                                                 paramOuter3,                                                                 Expression.MakeMemberAccess(paramTuple4, paramTuple4.Type.GetProperty("Item1"))                                                             ),                                                             paramTuple4                                                         )                                                     }                                             ),                                             Expression.Lambda(                                                 Expression.MakeMemberAccess(paramTuple3, paramTuple3.Type.GetProperty("Item2")),                                                 paramTuple3                                             )                                         }                                 ),                                  typeof(IEnumerable<TInner>)                             ),                             paramOuter3                         ),                         resultSelector                     }             );          return (IQueryable<TResult>)selectManyResult2;     } 

Usage

And the usage again:

db.A.LeftOuterJoin2(db.B, a => a.Id, b => b.IdA,         (a, b) => new { a, b, hello = "Hello World!" }); 

Looking at this you can think what is the sql query for all this? It might be huge. Guess what? It's quite small:

SELECT  1 AS [C1],  [Extent1].[Id] AS [Id],  [Extent1].[Text] AS [Text],  [Join1].[Id1] AS [Id1],  [Join1].[IdA] AS [IdA],  [Join1].[Text2] AS [Text2],  N'Hello World!' AS [C2] FROM  [A] AS [Extent1] INNER JOIN  (SELECT [Extent2].[Id] AS [Id2], [Extent2].[Text] AS [Text], [Extent3].[Id]    AS [Id1], [Extent3].[IdA] AS [IdA], [Extent3].[Text2] AS [Text2]     FROM  [A] AS [Extent2]     LEFT OUTER JOIN [B] AS [Extent3] ON [Extent2].[Id] = [Extent3].[IdA] ) AS [Join1] ON [Extent1].[Id] = [Join1].[Id2] 

Hope it helps.

like image 108
Tony Avatar answered Oct 21 '22 15:10

Tony


The accepted answer is a great start to explain the complexities behind a left outer join.

I found three rather serious issues with it, especially when taking this extension method and using it in more complex queries (chaining multiple left outer joins with normal joins then summarizing/max/count/...) Before you copy the selected answer into your production environment, please do read on.

Consider the original example from the linked SO post, which represents just about any left outer join done in LINQ:

var leftJoin = p.Person.Where(n => n.FirstName.Contains("a"))                    .GroupJoin(p.PersonInfo,                                n => n.PersonId,                               m => m.PersonId,                               (n, ms) => new { n, ms = ms })                    .SelectMany(z => z.ms.DefaultIfEmpty(), (n, m) => new { n = n, m )); 
  • The usage of a Tuple works, but when this is used as part of more complex queries, EF fails (cannot use constructors). To get around this, you either need to generate a new anonymous class dynamically (search stack overflow) or use a constructor-less type. I created this

    internal class KeyValuePairHolder<T1, T2> {     public T1 Item1 { get; set; }     public T2 Item2 { get; set; } } 
  • The usage of the "Queryable.DefaultIfEmpty" method. In the original and in the GroupJoin methods, correct methods that are chosen by the compiler are the "Enumerable.DefaultIfEmpty" methods. This has no influence in a simple query, but notice how the accepted answer has a bunch of Converts (between IQueryable and IEnumerable). Those cast also cause issues in more complex queries. It's ok to use the "Enumerable.DefaultIfEmpty" method in an Expression, EF knows not to execute it but to translate it into a join instead.

  • Finally, this is the bigger issue: there are two selects done whereas the original only does one select. You can read the cause in the code comments (beacuse of type difference (quite deep problem: some anonymous type != TOuter)) and see it in the SQL (Select from A inner join (a left outer join b)) The issue here is that the Original SelectMany method takes an object created in the Join method of type: KeyValuePairHolder of TOuter and IEnumerable of Tinner as it's first parameter, but the resultSelector expression passed takes a simple TOUter as it's first parameter. You can use an ExpressionVisitor to rewrite the expression that is passed into the correct form.

    internal class ResultSelectorRewriter<TOuter, TInner, TResult> : ExpressionVisitor {     private Expression<Func<TOuter, TInner, TResult>> resultSelector;     public Expression<Func<KeyValuePairHolder<TOuter, IEnumerable<TInner>>, TInner, TResult>> CombinedExpression { get; private set; }      private ParameterExpression OldTOuterParamExpression;     private ParameterExpression OldTInnerParamExpression;     private ParameterExpression NewTOuterParamExpression;     private ParameterExpression NewTInnerParamExpression;       public ResultSelectorRewriter(Expression<Func<TOuter, TInner, TResult>> resultSelector)     {         this.resultSelector = resultSelector;         this.OldTOuterParamExpression = resultSelector.Parameters[0];         this.OldTInnerParamExpression = resultSelector.Parameters[1];          this.NewTOuterParamExpression = Expression.Parameter(typeof(KeyValuePairHolder<TOuter, IEnumerable<TInner>>));         this.NewTInnerParamExpression = Expression.Parameter(typeof(TInner));          var newBody = this.Visit(this.resultSelector.Body);         var combinedExpression = Expression.Lambda(newBody, new ParameterExpression[] { this.NewTOuterParamExpression, this.NewTInnerParamExpression });         this.CombinedExpression = (Expression<Func<KeyValuePairHolder<TOuter, IEnumerable<TInner>>, TInner, TResult>>)combinedExpression;     }       protected override Expression VisitParameter(ParameterExpression node)     {         if (node == this.OldTInnerParamExpression)             return this.NewTInnerParamExpression;         else if (node == this.OldTOuterParamExpression)             return Expression.PropertyOrField(this.NewTOuterParamExpression, "Item1");         else             throw new InvalidOperationException("What is this sorcery?", new InvalidOperationException("Did not expect a parameter: " + node));      }  } 

Using the expression visitor and KeyValuePairHolder to avoid usage of Tuples, my updated version of the selected answer below fixes the three issues, is shorter, and produces shorter SQL:

 internal class QueryReflectionMethods     {         internal static System.Reflection.MethodInfo Enumerable_Select = typeof(Enumerable).GetMethods().First(x => x.Name == "Select" && x.GetParameters().Length == 2);         internal static System.Reflection.MethodInfo Enumerable_DefaultIfEmpty = typeof(Enumerable).GetMethods().First(x => x.Name == "DefaultIfEmpty" && x.GetParameters().Length == 1);          internal static System.Reflection.MethodInfo Queryable_SelectMany = typeof(Queryable).GetMethods().Where(x => x.Name == "SelectMany" && x.GetParameters().Length == 3).OrderBy(x => x.ToString().Length).First();         internal static System.Reflection.MethodInfo Queryable_Where = typeof(Queryable).GetMethods().First(x => x.Name == "Where" && x.GetParameters().Length == 2);         internal static System.Reflection.MethodInfo Queryable_GroupJoin = typeof(Queryable).GetMethods().First(x => x.Name == "GroupJoin" && x.GetParameters().Length == 5);         internal static System.Reflection.MethodInfo Queryable_Join = typeof(Queryable).GetMethods(System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.Public).First(c => c.Name == "Join");         internal static System.Reflection.MethodInfo Queryable_Select = typeof(Queryable).GetMethods().First(x => x.Name == "Select" && x.GetParameters().Length == 2);            public static IQueryable<TResult> CreateLeftOuterJoin<TOuter, TInner, TKey, TResult>(                    IQueryable<TOuter> outer,                    IQueryable<TInner> inner,                    Expression<Func<TOuter, TKey>> outerKeySelector,                    Expression<Func<TInner, TKey>> innerKeySelector,                    Expression<Func<TOuter, TInner, TResult>> resultSelector)         {               var keyValuePairHolderWithGroup = typeof(KeyValuePairHolder<,>).MakeGenericType(                 typeof(TOuter),                 typeof(IEnumerable<>).MakeGenericType(                     typeof(TInner)                     )                 );             var paramOuter = Expression.Parameter(typeof(TOuter));             var paramInner = Expression.Parameter(typeof(IEnumerable<TInner>));             var groupJoin =                 Queryable_GroupJoin.MakeGenericMethod(typeof(TOuter), typeof(TInner), typeof(TKey), keyValuePairHolderWithGroup)                 .Invoke(                     "ThisArgumentIsIgnoredForStaticMethods",                     new object[]{                     outer,                     inner,                     outerKeySelector,                     innerKeySelector,                     Expression.Lambda(                         Expression.MemberInit(                             Expression.New(keyValuePairHolderWithGroup),                              Expression.Bind(                                 keyValuePairHolderWithGroup.GetMember("Item1").Single(),                                   paramOuter                                 ),                              Expression.Bind(                                 keyValuePairHolderWithGroup.GetMember("Item2").Single(),                                  paramInner                                 )                             ),                         paramOuter,                          paramInner                         )                     }                 );               var paramGroup = Expression.Parameter(keyValuePairHolderWithGroup);             Expression collectionSelector = Expression.Lambda(                                                 Expression.Call(                                     null,                                     Enumerable_DefaultIfEmpty.MakeGenericMethod(typeof(TInner)),                                     Expression.MakeMemberAccess(paramGroup, keyValuePairHolderWithGroup.GetProperty("Item2")))                              ,                             paramGroup                         );              Expression newResultSelector = new ResultSelectorRewriter<TOuter, TInner, TResult>(resultSelector).CombinedExpression;               var selectMany1Result =                 Queryable_SelectMany.MakeGenericMethod(keyValuePairHolderWithGroup, typeof(TInner), typeof(TResult))                 .Invoke(                     "ThisArgumentIsIgnoredForStaticMethods", new object[]{                         groupJoin,                         collectionSelector,                         newResultSelector                     }                 );             return (IQueryable<TResult>)selectMany1Result;         }     } 
like image 32
Jan Van der Haegen Avatar answered Oct 21 '22 17:10

Jan Van der Haegen