Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is OrderBy which returns IOrderedEnumerable<T> much faster than Sort?

Tags:

This is a follow up of this excellent question C# Sort and OrderBy comparison. I will use the same example:

List<Person> persons = new List<Person>();
persons.Add(new Person("P005", "Janson"));
persons.Add(new Person("P002", "Aravind"));
persons.Add(new Person("P007", "Kazhal"));

The methods in contention are:

persons.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true));
//and
persons.OrderBy(n => n.Name);

Let me start by saying that I understand there isn't any significant performance difference to worry about. But I would love to know why does OrderBy perform so much better than Sort. I'm using the answer posted by @phoog in the original question.

private void button1_Click(object sender, EventArgs e)
{
    IEnumerable<Person> people;

    BenchMark(persons => persons.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true)));

    BenchMark(persons => people = persons.OrderBy(n => n.Name));
}

private static Random randomSeed = new Random();
public static string RandomString(int size, bool lowerCase)
{
    var sb = new StringBuilder(size);
    int start = (lowerCase) ? 97 : 65;
    for (int i = 0; i < size; i++)
    {
        sb.Append((char)(26 * randomSeed.NextDouble() + start));
    }
    return sb.ToString();
}

private static void BenchMark(Action<List<Person>> action)
{
    List<Person> persons = new List<Person>();
    for (int i = 0; i < 10000; i++)
    {
        persons.Add(new Person("P" + i.ToString(), RandomString(5, true)));
    }
    List<Person> unsortedPersons = new List<Person>(persons);

    Stopwatch watch = new Stopwatch();
    for (int i = 0; i < 100; i++)
    {
        watch.Start();

        action(persons);

        watch.Stop();
        persons.Clear();
        persons.AddRange(unsortedPersons);
    }

    MessageBox.Show(watch.Elapsed.TotalMilliseconds.ToString());
}

Result:

Sort() => 3500 ~ 5000 ms
OrderBy() => 0.2 ~ 1.5 ms

Though differences were profound even with smaller lists I tested initially, it became more and more glaring once the size of the collection went up. May be I'm missing something key to understanding .NET collections, but my thinking is since Sort acts on an existing List<T>, it should have lesser overhead (if every any) in processing when compared to OrderBy which acts on the same List<T> (in our case persons) but have to return another collection IOrderedEnumerable<T>. But still OrderBy performs far far better. List<T> might have certain overhead compared to IEnumerable<T> type, but Sort anyway acts on the existing list! Furthermore, I'm little amused to see a Linq method working faster than existing .NET method.

All the answers in the original question compare Sort against OrderBy.ToList which I believe will have some overhead and therefore performs more or less equally.

What could be the implementation differences?


Edit: Ok I learned something new. Here is how I confirmed about deferred execution.

private void button1_Click(object sender, EventArgs e)
{
    BenchMark(persons =>
    {
        persons.Sort((p1, p2) => string.Compare(p1.Name, p2.Name, true));
        foreach (var item in persons)
        {
            break;
        }
    });

    BenchMark(persons =>
    {
        IEnumerable<Person> people = persons.OrderBy(n => n.Name);
        foreach (var item in people)
        {
            break;
        }
    });
}

Sort ran in 4000 - 5000ms while OrderBy ran just above 5000ms. So indeed my conclusion was wrong. Both of them performed on equal terms once I started to enumerate the collections. I prefer the syntax of OrderBy anyday :)

Edit 2: I just found that this is exact duplicate of this one. But here is a more interesting question about deferred execution in general though not about ordering altogether.

like image 580
nawfal Avatar asked Nov 01 '12 16:11

nawfal


People also ask

Does OrderBy sort in place?

I can't believe that none of the answers mentioned this, but the biggest difference is this: OrderBy makes a sorted copy of the Array or List, while Sort actually sorts it in place.

What is the difference between sort and order?

Among the applicable definitions found were the following: SORT: To arrange (things, etc.) according to a kind or quality, or after some settled order or system; to separate and put into different sorts or classes. ORDER: The action of putting or keeping in order.


1 Answers

In this case, OrderBy is far faster because you're not actually executing it.

Until you enumerate the results, the query is deferred, so it's never actually doing the ordering. Until you actually enumerate through the results, the IOrderedEnumerable<T> doesn't process the input and do any form of ordering.

Try changing your benchmark to:

 BenchMark(persons => people = persons.OrderBy(n => n.Name).Count());

The Count() call will force the ordering to actually occur (since it needs to enumerate the IOrderedEnumerable<T> to generate a count), which should even out your timings significantly.

Most LINQ extension methods work this way - until you enumerate them (via Count(), calling ToList(), or just using them in a normal foreach loop, etc), they will have negligible impact, as they don't actually do anything directly other than build the enumerable. The reason the other benchmarks compare to OrderBy(...).ToList() is that the addition of ToList() forces the OrderBy to fully execute and actually order the results.

like image 118
Reed Copsey Avatar answered Oct 11 '22 12:10

Reed Copsey