Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ OrderBy. Does it always return the same ordered list?

I was trying out a simple OrderBy statement.

The target data to order is something like below:

[
  {"id":40, "description":"aaa", "rate":1},
  {"id":1,  "description":"bbb", "rate":1},
  {"id":4,  "description":"ccc", "rate":2},
  {"id":19, "description":"aaa", "rate":1} 
]

Then I order items by the rate property.

The odd thing is that if I 'order' them, it 'skips' some items by a given offset and then 'take' only portion of the data.

For example,

var result = items.OrderBy(i => i.rate);
var result = result.Skip(2);
var result = result.Take(2);

The result looks fine for the most of it, but the 'edge case' item is not returned at all.

For example,

if the first result came back as

    [{"id":40, "description":"aaa", "rate":1}, {"id":1, "description":"bbb", "rate":1}]

the second result comes back like

    [{"id":1, "description":"bbb", "rate":1}, {"id":4, "description":"ccc", "rate":2}]

Item "id: 19" has not been returned with the second query call. Instead item "id: 1" has returned twice.

My guess is that the SQL OrderBy statement doesn't produce the same ordered list every single time OrderBy orders by a given property, but the exact order within a group that shares the same property can change.

What is the exact mechanism under the hood?

like image 976
CodeSchool JJ Avatar asked Sep 18 '14 21:09

CodeSchool JJ


1 Answers

Short answer: LINQ to Objects uses a stable sort algorithm, so we can say that it is deterministic, and LINQ to SQL depends on the database implementation of Order By that is usually nondeterministic.

A deterministic sort algorithm is one that have always the same behavior on different runs.

In you example, you have duplicates in your OrderBy clause. For a guaranteed and predicted sort, one of the order clauses or the combination of order clauses must be unique.

In LINQ, you can achieve it by adding another OrderBy clause to refer your unique property, like in
items.OrderBy(i => i.Rate).ThenBy(i => i.ID).

Long answer:

LINQ to Objects uses a stable sort, as documented in this link: MSDN.

In LINQ to SQL, it depends on the sort algorithm of the underlying database and it is usually an unstable sort, like in MS SQL Server (MSDN).

In a stable sort, if the keys of two elements are equal, the order of the elements is preserved. In contrast, an unstable sort does not preserve the order of elements that have the same key.

Wikipedia example

So, for LINQ to SQL, the sorting is usually nondeterministic because the RDMS (Relational Database Management System, like MS SQL Server) may directly use a unstable sort algorithm with a random pivot selection or the randomness can be related with which row the database happens to access first in the file system.

For example, imagine that the size of a page in the file system can hold up to 4 rows.

The page will be full if you insert the following data:

     Page 1
| Name | Value |
|------|-------|
|   A  |   1   |
|   B  |   2   |
|   C  |   3   |
|   D  |   4   |


If you need to insert a new row, the RDMS has two options:

  1. Create a new page to allocate the new row.
  2. Split the current page in two pages. So the first page will hold the Names A and B and the second page will hold C and D.

Suppose that the RDMS chooses option 1 (to reduce index fragmentation). If you insert a new row with Name C and Value 9, you will get:

     Page 1              Page 2
| Name | Value |    | Name | Value |
|------|-------|    |------|-------|
|   A  |   1   |    |   C  |   9   |
|   B  |   2   |    |      |       |
|   C  |   3   |    |      |       |
|   D  |   4   |    |      |       |


Probably, the OrderBy clause in column Name will return the following:

| Name | Value |
|------|-------|
|   A  |   1   |
|   B  |   2   |
|   C  |   3   |
|   C  |   9   | -- Value 9 appears after because it was at another page
|   D  |   4   |


Now, suppose that the RDMS chooses option 2 (to increase the insert performance in a storage system with many spindles). If you insert a new row with Name C and Value 9, you will get:

     Page 1              Page 2
| Name | Value |    | Name | Value |
|------|-------|    |------|-------|
|   A  |   1   |    |   C  |   3   |
|   B  |   2   |    |   D  |   4   |
|   C  |   9   |    |      |       |
|      |       |    |      |       |


Probably, the OrderBy clause in column Name will return the following:

| Name | Value |
|------|-------|
|   A  |   1   |
|   B  |   2   |
|   C  |   9   |  -- Value 9 appears before because it was at the first page
|   C  |   3   | 
|   D  |   4   |


Regarding your example:

I believe that you have mistyped something in your question, because you have used items.OrderBy(i => i.rate).Skip(2).Take(2); and the first result do not show a row with Rate = 2. This is not possible since the Skip will ignore the first two rows and they have Rate = 1, so your output must show the row with Rate = 2.

You've tagged your question with database, so I believe that you are using LINQ to SQL. In this case, results can be nondeterministic and you could get the following:

Result 1:

[{"id":40, "description":"aaa", "rate":1},
 {"id":4, "description":"ccc", "rate":2}]

Result 2:

[{"id":1, "description":"bbb", "rate":1},
 {"id":4, "description":"ccc", "rate":2}]

If you had used items.OrderBy(i => i.rate).ThenBy(i => i.ID).Skip(2).Take(2); then the only possible result would be:

[{"id":40, "description":"aaa", "rate":1},
 {"id":4, "description":"ccc", "rate":2}]
like image 112
Zanon Avatar answered Oct 12 '22 23:10

Zanon