Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Entity Framework ignore order by when followed with distinct?

I'd like to know why the following join returns values in an unordered fashion. Does distinct imply that the order of the IQueryable passed is not guaranteed, therefore EF does not bother generating an order by clause in the SQL.

 var currentUtc = DateTime.Now;
 var data = (from a in ItemsA
     join c in Customers on a.Fk_CustomerID equals c.ID into customerSubset
     from cs in customerSubset.DefaultIfEmpty()
     where cs.Age > 18)                                    
     orderby a.ID ascending
     select new
     {
       a.ID,
       a.someProperty,
       cs.CustomerUID,
       CustomerName = cs.Name,                               
       UpdateUTC = currentUtc
     }).Distinct().Take(1000).ToList();

The strange thing is that removing the distinct adds the order by clause in the inner query of the generated SQL (you can use Linqpad to see the SQL generated for eg.)

Now if I replace the last line with

.Distinct().OrderBy(x => s.ID).Take(1000).ToList();

I'm getting the order by clause in the SQL, whether or not I have the orderby a.ID ascending in the inner query. This makes sense, but why is chaining distinct after orderby also not yield the same order?

like image 370
arviman Avatar asked Oct 18 '25 12:10

arviman


1 Answers

The DISTINCT operation in SQL doesn't guarantee order. Internally it performs a sort before determining if rows are identical.

Even that order isn't guaranteed though because the query engine can partition the data for parallel processing, then recombine the partitioned data to produce the final result.

To guarantee a specific order, the ORDER BY clause or the OrderBy() call should be the last one before Skip() or Take()

like image 81
Panagiotis Kanavos Avatar answered Oct 21 '25 02:10

Panagiotis Kanavos