Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Have EF Linq Select statement Select a constant or a function

I have a Select statement that is currently formatted like

dbEntity
.GroupBy(x => x.date)
.Select(groupedDate => new {
                             Calculation1 = doCalculation1 ? x.Sum(groupedDate.Column1) : 0),
                             Calculation2 = doCalculation2 ? x.Count(groupedDate) : 0)

In the query doCalculation1 and doCalculation2 are bools that are set earlier. This creates a case statement in the Sql being generated, like

DECLARE @p1 int = 1
DECLARE @p2 int = 0
DECLARE @p3 int = 1
DECLARE @p4 int = 0
SELECT (Case When @p1 = 1 THEN Sum(dbEntity.Column1)
     Else @p2
     End) as Calculation1,
     (Case When @p3 = 1 THEN Count(*)
     Else @p4
     End) as Calculation2

What I want to happen is for the generated sql is to be like this when doCalculation1 is true

SELECT SUM(Column1) as Calculation1, Count(*)  as Calculation2

and like this when doCalculation2 is false

SELECT 0 as Calculation1, Count(*) as Calculation2

Is there any way to force a query through EF to act like this?

Edit:

bool doCalculation = true;
bool doCalculation2 = false;
            dbEntity
            .Where(x => x.FundType == "E")
            .GroupBy(x => x.ReportDate)
              .Select(dateGroup => new 
              {
                  ReportDate = dateGroup.Key,
                  CountInFlows = doCalculation2 ? dateGroup.Count(x => x.Flow > 0) : 0,
                  NetAssetEnd = doCalculation ? dateGroup.Sum(x => x.AssetsEnd) : 0
              })
              .ToList();

generates this sql

-- Region Parameters
DECLARE @p0 VarChar(1000) = 'E'
DECLARE @p1 Int = 0
DECLARE @p2 Decimal(5,4) = 0
DECLARE @p3 Int = 0
DECLARE @p4 Int = 1
DECLARE @p5 Decimal(1,0) = 0
-- EndRegion
SELECT [t1].[ReportDate], 
    (CASE 
        WHEN @p1 = 1 THEN (
            SELECT COUNT(*)
            FROM [dbEntity] AS [t2]
            WHERE ([t2].[Flow] > @p2) AND ([t1].[ReportDate] = [t2].[ReportDate]) AND ([t2].[FundType] = @p0)
        )
        ELSE @p3
     END) AS [CountInFlows], 
    (CASE 
        WHEN @p4 = 1 THEN CONVERT(Decimal(33,4),[t1].[value])
        ELSE CONVERT(Decimal(33,4),@p5)
     END) AS [NetAssetEnd]
FROM (
    SELECT SUM([t0].[AssetsEnd]) AS [value], [t0].[ReportDate]
    FROM [dbEntity] AS [t0]
    WHERE [t0].[FundType] = @p0
    GROUP BY [t0].[ReportDate]
    ) AS [t1]

which has many index scans and a spool and a join in the execution plan. It also takes about 20 seconds on average to run on the test set, with the production set going to be much larger.

I want it to run in the same speed as sql like

select reportdate, 1, sum(AssetsEnd)
from vwDailyFundFlowDetail
where fundtype = 'E'
group by reportdate

which runs in about 12 seconds on average and has the majority of the query tied up in a single index seek in the execution plan. What the actual sql output is doesnt matter, but the performance appears to be much worse with the case statements.

As for why I am doing this, I need to generate a dynamic select statements like I asked in Dynamically generate Linq Select. A user may select one or more of a set of calculations to perform and I will not know what is selected until the request comes in. The requests are expensive so we do not want to run them unless they are necessary. I am setting the doCalculation bools based on the user request.

This query is supposed to replace some code that inserts or deletes characters from a hardcoded sql query stored as a string, which is then executed. That runs fairly fast but is a nightmare to maintain

like image 214
Alexander Burke Avatar asked May 16 '26 15:05

Alexander Burke


1 Answers

It would technically be possible to pass the Expression in your Select query through an expression tree visitor, which checks for constant values on the left-hand side of ternary operators, and replaces the ternary expression with the appropriate sub-expression.

For example:

public class Simplifier : ExpressionVisitor
{
    public static Expression<T> Simplify<T>(Expression<T> expr)
    {
        return (Expression<T>) new Simplifier().Visit(expr);
    }

    protected override Expression VisitConditional(ConditionalExpression node)
    {
        var test = Visit(node.Test);
        var ifTrue = Visit(node.IfTrue);
        var ifFalse = Visit(node.IfFalse);

        var testConst = test as ConstantExpression;
        if(testConst != null)
        {
            var value = (bool) testConst.Value;
            return value ? ifTrue : ifFalse;
        }

        return Expression.Condition(test, ifTrue, ifFalse);
    }

    protected override Expression VisitMember(MemberExpression node)
    {
        // Closed-over variables are represented as field accesses to fields on a constant object.
        var field = (node.Member as FieldInfo);
        var closure = (node.Expression as ConstantExpression);
        if(closure != null)
        {
            var value = field.GetValue(closure.Value);
            return VisitConstant(Expression.Constant(value));
        }
        return base.VisitMember(node);
    }
}

Usage example:

void Main()
{
    var b = true;
    Expression<Func<int, object>> expr = i => b ? i.ToString() : "N/A";
    Console.WriteLine(expr.ToString()); // i => IIF(value(UserQuery+<>c__DisplayClass0).b, i.ToString(), "N/A")
    Console.WriteLine(Simplifier.Simplify(expr).ToString()); // i => i.ToString()
    b = false;
    Console.WriteLine(Simplifier.Simplify(expr).ToString()); // i => "N/A"
}

So, you could use this in your code something like this:

Expression<Func<IGrouping<DateTime, MyEntity>>, ClassYouWantToReturn> select = 
    groupedDate => new {
        Calculation1 = doCalculation1 ? x.Sum(groupedDate.Column1) : 0),
        Calculation2 = doCalculation2 ? x.Count(groupedDate) : 0
    };
var q = dbEntity
    .GroupBy(x => x.date)
    .Select(Simplifier.Simplify(select))

However, this is probably more trouble than it's worth. SQL Server will almost undoubtedly optimize the "1 == 1" case away, and allowing Entity Framework to produce the less-pretty query shouldn't prove to be a performance problem.

Update

Looking at the updated question, this appears to be one of the few instances where producing the right query really does matter, performance-wise.

Besides my suggested solution, there are a few other choices: you could use raw sql to map to your return type, or you could use LinqKit to choose a different expression based on what you want, and then "Invoke" that expression inside your Select query.

like image 67
StriplingWarrior Avatar answered May 18 '26 03:05

StriplingWarrior



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!