Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quicksort with Linq performance Advantage passing T[] vs. IEnumerable<T>

Tags:

c#

linq

Just for fun I built a quicksort implementation in C# with Linq:

public static IEnumerable<T> quicksort<T>(IEnumerable<T> input) where T : IComparable<T>{
    if (input.Count() <= 1) return input;
    var pivot = input.FirstOrDefault();
    var lesser = quicksort(input.Skip(1).Where(i => i.CompareTo(pivot) <= 0));
    var greater = quicksort(input.Where(i => i.CompareTo(pivot) > 0));
    return lesser.Append(pivot).Concat(greater);
}

It sorts 10000 random integers in about 13 seconds.

Changing it to use int[] instead of List results in about 700 times better performance! It only takes 21ms to sort the same 10000 random integers.

public static T[] quicksortArray<T>(T[] input) where T : IComparable<T>{
    if (input.Count() <= 1) return input;
    var pivot = input.FirstOrDefault();
    var lesser = quicksortArray(input.Skip(1).Where(i => i.CompareTo(pivot) <= 0).ToArray());
    var greater = quicksortArray(input.Where(i => i.CompareTo(pivot) > 0).ToArray());
    return lesser.Append(pivot).Concat(greater).ToArray();
}

Just looking at the code I would have assumed this would have worse performance. I assumed that .ToArray() would create an additional array in memory and copy all integers there. I think passing an array vs. passing a list should take about the same time.

So where does this huge performance difference come from?

like image 777
marv51 Avatar asked Mar 16 '17 21:03

marv51


2 Answers

This is why you should be very careful about iterating an IEnumerable multiple times.

The first time you call quicksort you pass in, say, List. It calls quicksort two more times, in each of those cases, the IEnumerable that you're passing in represents a query that will skip the first item and then perform a comparison on every single item after that. You then take that query and pass it to two more instances of quicksort, but making the query not only skip the first item and compare every item after it, but then skip the first item of the results of that query and then compare every item after that to something. This means that by the time you finally reach a value, you've got a query that represents log(n) skips, and is comparing every item in the sequence to a value log(n) times.

In your second implementation you're not passing in queries to quicksort, you're evaluating those queries to their values and passing in the values to the operation, which can then use those values twice, rather than continually performing an extremely complex query multiple times.

like image 189
Servy Avatar answered Oct 10 '22 15:10

Servy


The thing about linq queries are they are lazy, they won't be evaluated until you call a method like ToArray or ToList. In your first code for example, this query:

input.Skip(1).Where(i => i.CompareTo(pivot))

Will be evaluated at least twice every time you call quicksort, once for Count() and once for FirstOrDefault. Which means the filtering operation will be repeated for each call over and over. When you use ToArray on the other hand, since you have the filtered elements in an array already, Where won't be executed every time, it will be executed once you call the ToArray and that's it. This is the difference between the codes, based on this you can do the math for the other parts.

like image 30
Selman Genç Avatar answered Oct 10 '22 14:10

Selman Genç