Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Linq-to-sql incorrectly removing fields in my union?

I am trying to aggregate several queries together to provide a latest updates display for my users using a common data structure for union support. In my first function I have the following select clause:

.Select(x => new PlayerUpdateInfo
{
    FirstName = x.PodOptimizedSearch.FirstName,
    LastName = x.PodOptimizedSearch.LastName,
    RecruitId = x.RecruitId,
    Date = x.Date,
    UpdateMessage = x.IsAddedAction
      ? "Player was added to this recruiting board by " + x.Person.FirstName
        + " " + x.Person.LastName
      : "Player was removed from this recruiting board by " + 
        x.Person.FirstName + " " + x.Person.LastName,

    // Defaults for union support
    Team = string.Empty,
    UpdateComments = x.Comments,
    TeamId = 0,
    State = string.Empty
});

When called this correctly generates a query that returns 9 fields. The second method's select is:

select new PlayerUpdateInfo
{
    FirstName = recruit.FirstName,
    LastName = recruit.LastName,
    RecruitId = recruit.RecruitId,
    Date = asset.CreateDate,
    UpdateMessage = "New Full Game Added",

    // Defaults for union support
    Team = null,
    UpdateComments = null,
    TeamId = 0,
    State = null
    };

When this query is run by itself it correctly returns 9 values. However, when I try to do

var query = GetFirstQuery();
query = query.union(GetSecondQuery()); 

I get a sql exception that the query failed because all the queries do not have the same number of fields. Investigating the generated SQL shows that the first query is correct, but Linq is generating the 2nd (unioned) query to only have 7 fields. After testing it appears that Linq seems to be "optimizing" out the nulls so it's only returning one null column instead of 3, thus causing a mis-match.

Why is Linq-to-sql incorrectly generating the union and how can I work around this?


Edit:

Ok the issue seems to deal with union chaining in Linq-to-Sql for .Net 3.5. This does not happen in 4 apparently. With the following code:

    protected IQueryable<PlayerUpdateInfo> Test1()
    {
        return PodDataContext.Assets
                             .Select(x => new PlayerUpdateInfo
                             {
                                 Date = DateTime.Now,
                                 FirstName = x.Title,
                                 LastName = string.Empty,
                                 RecruitId = 0,
                                 State = string.Empty,
                                 Team = string.Empty,
                                 TeamId = 0,
                                 UpdateComments = string.Empty,
                                 UpdateMessage = string.Empty
                             });
    }

    protected IQueryable<PlayerUpdateInfo> Test2()
    {
        return PodDataContext.SportPositions
                             .Select(x => new PlayerUpdateInfo
                             {
                                 Date = DateTime.Now,
                                 FirstName = string.Empty,
                                 LastName = x.Abbreviation,
                                 RecruitId = 0,
                                 State = string.Empty,
                                 Team = string.Empty,
                                 TeamId = 0,
                                 UpdateComments = string.Empty,
                                 UpdateMessage = string.Empty
                             });
    }

I then union chain via: var q2 = Test1().Union(Test2()).Union(Test1());

In .Net 3.5 I get the following sql, which has a mis-match and fails

SELECT [t4].[value] AS [RecruitId], [t4].[Title] AS [FirstName], [t4].[value2] AS [LastName], [t4].[value22] AS [Team], [t4].[value3] AS [Date]
FROM (
    SELECT [t2].[value], [t2].[Title], [t2].[value2], [t2].[value2] AS [value22], [t2].[value3]
    FROM (
        SELECT @p0 AS [value], [t0].[Title], @p1 AS [value2], @p2 AS [value3]
        FROM [dbo].[Assets] AS [t0]
        UNION
        SELECT @p3 AS [value], @p4 AS [value2], [t1].[Abbreviation], @p5 AS [value3]
        FROM [dbo].[SportPositions] AS [t1]
        ) AS [t2]
    UNION
    SELECT @p6 AS [value], [t3].[Title], @p7 AS [value2], @p8 AS [value3]
    FROM [dbo].[Assets] AS [t3]
    ) AS [t4]

In .net 4 the following code is generated:

SELECT [t4].[value] AS [RecruitId], [t4].[Title] AS [FirstName], [t4].[value2] AS [LastName], [t4].[value3] AS [Team], [t4].[value4] AS [TeamId], [t4].[value5] AS [State], [t4].[value6] AS [UpdateMessage], [t4].[value7] AS [UpdateComments], [t4].[value8] AS [Date]
FROM (
    SELECT [t2].[value], [t2].[Title], [t2].[value2], [t2].[value3], [t2].[value4], [t2].[value5], [t2].[value6], [t2].[value7], [t2].[value8]
    FROM (
        SELECT @p0 AS [value], [t0].[Title], @p1 AS [value2], @p2 AS [value3], @p3 AS [value4], @p4 AS [value5], @p5 AS [value6], @p6 AS [value7], @p7 AS [value8]
        FROM [dbo].[Assets] AS [t0]
        UNION
        SELECT @p8 AS [value], @p9 AS [value2], [t1].[Abbreviation], @p10 AS [value3], @p11 AS [value4], @p12 AS [value5], @p13 AS [value6], @p14 AS [value7], @p15 AS [value8]
        FROM [dbo].[SportPositions] AS [t1]
        ) AS [t2]
    UNION
    SELECT @p16 AS [value], [t3].[Title], @p17 AS [value2], @p18 AS [value3], @p19 AS [value4], @p20 AS [value5], @p21 AS [value6], @p22 AS [value7], @p23 AS [value8]
    FROM [dbo].[Assets] AS [t3]
    ) AS [t4]

That is valid sql and works. As we are unable to bring our production systems up to .net 4 for various reasons, does anyone know a way I can work around this in .net 3.5?

like image 233
KallDrexx Avatar asked Jun 25 '12 22:06

KallDrexx


1 Answers

This is an updated answer - it would seem that in 3.5 the optimization cannot be switched off. I've tried numerous things in LinqPad, and as soon as there's a C# expression that's not derived from a column, the query compiler takes it out of the SQL.

So we need something that'll be derived from a column expression but which we can force always to be null.

Here's one solution I've been able to get to work - write a conditional expression that returns null when a column value is equal to itself; thus always returning null. It's not going to be pretty, because Linq to Sql in 3.5 appears to be very aggressive in optimizing a query (we have to use unique comparisons for each null we want, e.g. use == in one, then != on the next and so on); and it'll get uglier when we modify it to deal with nulls if you have to do this hack on a nullable column:

I'm just picking out column names that I can see from your code - but you might want to use different ones:

select new PlayerUpdateInfo  
{  
  FirstName = recruit.FirstName,  
  LastName = recruit.LastName,  
  RecruitId = recruit.RecruitId,  
  Date = asset.CreateDate,  
  UpdateMessage = "New Full Game Added",  

  // Defaults for union support  
  Team = recruit.FirstName = recruit.FirstName ? null : "",  
  UpdateComments = recruit.FirstName != recruit.FirstName ? "" : null,  
  TeamId = recruit.LastName = recruit.LastName ? null : "",  
  State = recruit.LastName != recruit.LastName ? "" : null
};

In LinqPad I get a column expression in the generated SQL for each of the statements where I use a ? :. The "" on the other side of the expression is because SQL moans about CASE statements having more than one null.

Clearly - this solution, apart from being ugly as sin, is bounded by how many columns are available for you to do these fake comparisons on, and indeed how many unique comparisons are mapped by L2S.

like image 143
Andras Zoltan Avatar answered Oct 17 '22 16:10

Andras Zoltan