Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up the linq group by statement

Tags:

c#

linq

I have a table like this

UserID   Year   EffectiveDate   Type    SpecialExpiryDate
     1   2015   7/1/2014        A   
     1   2016   7/1/2015        B       10/1/2015

there is no ExpriyDate in the table because it is only valid for one year, so the expiry date can be calculated from the effective date by adding a year.

The result I want to get is like this (the current year's effective date and the next year's expiry date)

UserID   EffectiveDate   ExpiryDate
     1    7/1/2014        7/1/2016

And If the user's type is B, then there will be a special expiry date, so for this person, the result will be

UserID   EffectiveDate   ExpiryDate
     1    7/1/2014        10/1/2015

Here is the code I wrote

var result = db.Table1
            .Where(x => x.Year>= 2015 && (x.Type == "A" || x.Type == "B"))
            .GroupBy(y => y.UserID)
            .OrderByDescending(x => x.FirstOrDefault().Year)
            .Select(t => new
                         {
                             ID = t.Key,
                             Type = t.FirstOrDefault().Type,
                             EffectiveDate = t.FirstOrDefault().EffectiveDate,
                             ExpiryDate = t.FirstOrDefault().SpecialExpiryDate != null ? t.FirstOrDefault().SpecialExpiryDate : (t.Count() >= 2 ? NextExpiryDate : CurrentExpiryDate)
                          }
                    );

The code can get the result I need, but the problem is that in the result set there are about 10000 records which took about 5 to 6 seconds. The project is for a web search API, so I want to speed it up, is there a better way to do the query?

Edit

Sorry I made a mistake, in the select clause it should be

EffectiveDate = t.LastOrDefault().EffectiveDate

but in the Linq of C#, it didn't support this LastOrDefault function transfered to sql, and it cause the new problem, what is the easiest way to get the second item of the group?

like image 839
pita Avatar asked Nov 17 '25 19:11

pita


2 Answers

You could generate the calculated data on the fly, using a View in your database.

Something like this (pseudocode):

Create View vwUsers AS 
    Select 
        UserID, 
        Year, 
        EffectiveDate, 
        EffectiveData + 1 as ExpiryDate,   // <-- 
        Type, 
        SpecialExpiryDate
    From 
        tblUsers

And just connect your LINQ query to that.

like image 160
oɔɯǝɹ Avatar answered Nov 19 '25 09:11

oɔɯǝɹ


Try this:

var result =
    db
        .Table1
        .Where(x => x.Year>= 2015 && (x.Type == "A" || x.Type == "B"))
        .GroupBy(y => y.UserID)
        .SelectMany(y => y.Take(1), (y, z) => new
        {
            ID = y.Key,
            z.Type,
            z.EffectiveDate,
            ExpiryDate = z.SpecialExpiryDate != null
                ? z.SpecialExpiryDate 
                : (t.Count() >= 2 ? NextExpiryDate : CurrentExpiryDate),
            z.Year,
        })
        .OrderByDescending(x => x.Year);

The .SelectMany(y => y.Take(1) effectively does the .FirstOrDefault() part of your code. By doing this once rather than for many properties you may improve the speed immensely.

In a test I performed using a similarly structured query I got these sub-queries being run when using your approach:

SELECT t0.increment_id
FROM sales_flat_order AS t0
GROUP BY t0.increment_id

SELECT t0.hidden_tax_amount
FROM sales_flat_order AS t0
WHERE ((t0.increment_id IS NULL AND @n0 IS NULL) OR (t0.increment_id = @n0))
LIMIT 0, 1
-- n0 = [100000001]

SELECT t0.customer_email
FROM sales_flat_order AS t0
WHERE ((t0.increment_id IS NULL AND @n0 IS NULL) OR (t0.increment_id = @n0))
LIMIT 0, 1
-- n0 = [100000001]

SELECT t0.hidden_tax_amount
FROM sales_flat_order AS t0
WHERE ((t0.increment_id IS NULL AND @n0 IS NULL) OR (t0.increment_id = @n0))
LIMIT 0, 1
-- n0 = [100000002]

SELECT t0.customer_email
FROM sales_flat_order AS t0
WHERE ((t0.increment_id IS NULL AND @n0 IS NULL) OR (t0.increment_id = @n0))
LIMIT 0, 1
-- n0 = [100000002]

(This continued on for two sub-queries per record number.)

If I ran my approach I got this single query:

SELECT t0.increment_id, t1.hidden_tax_amount, t1.customer_email
FROM (
  SELECT t2.increment_id
  FROM sales_flat_order AS t2
  GROUP BY t2.increment_id
  ) AS t0
CROSS APPLY (
  SELECT t3.customer_email, t3.hidden_tax_amount
  FROM sales_flat_order AS t3
  WHERE ((t3.increment_id IS NULL AND t0.increment_id IS NULL) OR (t3.increment_id = t0.increment_id))
  LIMIT 0, 1
  ) AS t1

My approach should be much faster.

like image 35
Enigmativity Avatar answered Nov 19 '25 09:11

Enigmativity



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!