Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by with paging (take skip)

I am trying to make some kind of paging. But, I need to do it on a grouped result, because every time I do a page. It is a requirement that all data for a given group is fetched.

Below code:

var erere = dbCtx.StatusViewList
                 .GroupBy(p => p.TurbineNumber)
                 .OrderBy(p => p.FirstOrDefault().TurbineNumber)
                 .Skip(0)
                 .Take(10)
                 .ToList();

I have 200k items and the statement above seems to be so slow the connection times out. My best bet is its the orderby that slows it down. Any suggestions how to do this, or how to speed the statement above up?

like image 213
Thomas Segato Avatar asked Oct 23 '25 04:10

Thomas Segato


2 Answers

At your case, grouping on server side is not needed at all, because anyway you will get all data, but with additional overhead on server side. So try another approach:

var groupPage = dbCtx.StatusViewList.Select(x => TurbineNumber)
                  .Distinct().OrderBy(x => x.TurbineNumber).Skip(40).Take(20).ToList();

var data = dbCtx.StatusViewList.Where(x => groupPage.Contains(x.TurbineNumber))
                  .ToList().GroupBy(x => x.TurbineNumber).ToList();
like image 196
Slava Utesinov Avatar answered Oct 25 '25 17:10

Slava Utesinov


The GroupBy needs to visit all elements to group all StatusViews into groups of StatusViews that have equal TurbineNumber.

After that, you take every group, from every group your take the first element and ask for its TurbineNumber, to sort by Turbine Number.

Apparently you take into account that a group of StatusViews might be empty (FirstOrDefault, instead of First), but then again, you assume that FirstOrDefault never returns null.

One of the things that could speed up your query is using the Key of your groups. The Key is the element on which you grouped, in your case the TurbineNumber: All elements in the a group have the same TurbineNumber.

var result = dbCtx.StatusViewList
    .GroupBy(statusView => statusView.TurbineNumber)
    .OrderBy(group => group.Key)
    ...

I think that will be a first step to improve performance.

However, you return a fixed number of Groups. Some Groups might be huge, 1000s of elements, some groups might be small: only one element. So the result of one page could be 10 groups, each with 1000 elements, having a total of 10000 elements. It could also be 10 groups, each with 1 element, a total of 10 elements. I'm not sure if this would be the result you want by paging.

Wouldn't you prefer a page that always has the same number of elements, preferably with the same TurbineNumber, If there are not many same TurbineNumbers fill the rest of your page with the next TurbineNumber. If there are too many StatusViews with this TurbineNumber divide them into several pages?

Something like:

TurbineNumber StatusView
     4            A
     4            B
     4            F
     5            D
     5            K
     6            C
     6            Z
     6            Q
     6            W
     7            E

To do this, don't GroupBy, use OrderBy and then Skip and Take

IEnumerable<StatusView> GetPage(int pageNr, int pageSize)
{
     return dbCtx.StatusViewList
         .Orderby(statusView => statusView.TurbineNumber)
         .Skip(pageNr * pageSize)
         .Take(pageSize)
}

If you create an extra index for TurbineNumber, this will be very fast:

In your DbContext.OnModelCreating(DbModelBuilder modelBuilder):

// Add an extra index on TurbineNumber:
var indexAttribute = new IndexAttribute("TurbineIndex", 0) {IsUnique = false}
var indexAnnotation =new IndexAnnotation(indexAttribute);
modelBuilder.Entity<Statusview>()
    .Property(statusView => statusView.TurbineNumber)
    .HasColumnAnnotation("MyIndexName", indexAnnotation);
like image 28
Harald Coppoolse Avatar answered Oct 25 '25 19:10

Harald Coppoolse



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!