Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get COUNT DISTINCT in translated SQL with EF Core

Tags:

c#

ef-core-2.1

I want to have EF core translate .Select(x=>x.property).Distinct().Count() into something like

SELECT COUNT(DISTINCT property)

Let's take an example. Let's say I have a DB table with PersonID(long), VisitStart(datetime2) and VisitEnd(datetime2). If i want to get the number of distinct days a particular person has visited, then I could write SQL like

SELECT COUNT(DISTINCT CONVERT(date, VisitStart)) FROM myTable GROUP BY PersonID

But using EF core and this

MyTable
    .GroupBy(x=>x.PersonID)
    .Select(x=> new 
    {
        Count = x.Select(y=>y.VisitStart.Date).Distinct().Count()
    })

which gives the right results, translates into this SQL

SELECT [x].[PersonID], [x].[VisitStart], [x].[VisitEnd]
FROM [myTable] as [x]
ORDER BY [x].[PersonID]

There is no GROUP BY and no DISTINCT or COUNT anywhere so the grouping must be done in memory, which is not ideal when operating on a table that has millions of records that potentially has to be pulled from DB.

So anyone know how to get EF core to translate a .Select(...).Distinct().Count() into SELECT COUNT(DISTINCT ...)

like image 354
smok Avatar asked Jun 28 '19 08:06

smok


People also ask

How do I count distinct numbers in SQL?

To count the number of different values that are stored in a given column, you simply need to designate the column you pass in to the COUNT function as DISTINCT . When given a column, COUNT returns the number of values in that column. Combining this with DISTINCT returns only the number of unique (and non-NULL) values.

Can we use count with distinct?

Yes, you can use COUNT() and DISTINCT together to display the count of only distinct rows. SELECT COUNT(DISTINCT yourColumnName) AS anyVariableName FROM yourTableName; To understand the above syntax, let us create a table.

How do you count distinct numbers?

The COUNT DISTINCT function returns the number of unique values in the column or expression, as the following example shows. SELECT COUNT (DISTINCT item_num) FROM items; If the COUNT DISTINCT function encounters NULL values, it ignores them unless every value in the specified column is NULL.

What is the difference between Count and distinct?

COUNT(column name) vs COUNT (DISTINCT column_name)COUNT(column_name) will include duplicate values when counting. In contrast, COUNT (DISTINCT column_name) will count only distinct (unique) rows in the defined column.


1 Answers

I wanted to share an idea I had for solving my issues about count distinct.

Ultimately another way of doing count distinct in a group by function, is by having nested group by functions (assuming you can aggregate your data through).

Here is an example of what I used, it seems to work.

Apologes for the criptic acronims, I am using this to keep my JSON as small as can be.

var myData = _context.ActivityItems
                        .GroupBy(a => new { ndt = EF.Property<DateTime>(a, "dt").Date, ntn = a.tn })
                        .Select(g => new
                        {
                            g.Key.ndt,
                            g.Key.ntn,
                            dpv = g.Sum(o => o.pv),
                            dlv = g.Sum(o => o.lv),
                            cnt = g.Count(),
                        })
                        .GroupBy(a => new { ntn = a.ntn })
                        .Select(g => new
                        {
                            g.Key.ntn,
                            sd = g.Min(o => o.ndt),
                            ld = g.Max(o => o.ndt),
                            pSum = g.Sum(o => o.dpv),
                            pMin = g.Min(o => o.dpv),
                            pMax = g.Max(o => o.dpv),
                            pAvg = g.Average(o => o.dpv),
                            lSum = g.Sum(o => o.dlv),
                            lMin = g.Min(o => o.dlv),
                            lMax = g.Max(o => o.dlv),
                            lAvg = g.Average(o => o.dlv),
                            n10s = g.Sum(o => o.cnt),
                            ndays = g.Count()
                        });
like image 65
Gareth Avatar answered Sep 28 '22 11:09

Gareth