Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient algorithm for merging sorted IEnumerable<T>

I have several huge sorted enumerable sequences that I want to merge. Theses lists are manipulated as IEnumerable but are already sorted. Since input lists are sorted, it should be possible to merge them in one trip, without re-sorting anything.

I would like to keep the defered execution behavior.

I tried to write a naive algorithm which do that (see below). However, it looks pretty ugly and I'm sure it can be optimized. It may exist a more academical algorithm...

IEnumerable<T> MergeOrderedLists<T, TOrder>(IEnumerable<IEnumerable<T>> orderedlists,                                              Func<T, TOrder> orderBy) {     var enumerators = orderedlists.ToDictionary(l => l.GetEnumerator(), l => default(T));     IEnumerator<T> tag = null;      var firstRun = true;     while (true)     {         var toRemove = new List<IEnumerator<T>>();         var toAdd = new List<KeyValuePair<IEnumerator<T>, T>>();         foreach (var pair in enumerators.Where(pair => firstRun || tag == pair.Key))         {             if (pair.Key.MoveNext())                 toAdd.Add(pair);             else                 toRemove.Add(pair.Key);         }          foreach (var enumerator in toRemove)             enumerators.Remove(enumerator);          foreach (var pair in toAdd)             enumerators[pair.Key] = pair.Key.Current;          if (enumerators.Count == 0)             yield break;          var min = enumerators.OrderBy(t => orderBy(t.Value)).FirstOrDefault();         tag = min.Key;         yield return min.Value;          firstRun = false;     } } 

The method can be used like that:

// Person lists are already sorted by age MergeOrderedLists(orderedList, p => p.Age); 

assuming the following Person class exists somewhere:

    public class Person     {         public int Age { get; set; }     } 

Duplicates should be conserved, we don't care about their order in the new sequence. Do you see any obvious optimization I could use?

like image 383
franck Avatar asked May 04 '10 16:05

franck


People also ask

What is the best scenario for merge sort?

An example of an array that would be the worst case scenario for Merge Sort is [1, 9, 5, 13, 3, 11, 7, 15].. Best case: In the best case scenario, the array is already sorted, but we still have to split and recombine it back.

What is the efficiency of Merge sort?

Merge sort is one of the most efficient sorting algorithms. It works on the principle of Divide and Conquer based on the idea of breaking down a list into several sub-lists until each sublist consists of a single element and merging those sublists in a manner that results into a sorted list.

What will be the best case time complexity of Merge sort?

Best Case Complexity: The merge sort algorithm has a best-case time complexity of O(n*log n) for the already sorted array.

Is Merge sort the best sorting algorithm?

Merge sort is more efficient and works faster than quick sort in case of larger array size or datasets. Quick sort is more efficient and works faster than merge sort in case of smaller array size or datasets. Sorting method : The quick sort is internal sorting method where the data is sorted in main memory.


2 Answers

Here is my fourth (thanks to @tanascius for pushing this along to something much more LINQ) cut at it:

public static IEnumerable<T> MergePreserveOrder3<T, TOrder>(     this IEnumerable<IEnumerable<T>> aa,     Func<T, TOrder> orderFunc) where TOrder : IComparable<TOrder> {     var items = aa.Select(xx => xx.GetEnumerator()).Where(ee => ee.MoveNext())         .OrderBy(ee => orderFunc(ee.Current)).ToList();      while (items.Count > 0)     {         yield return items[0].Current;          var next = items[0];         items.RemoveAt(0);         if (next.MoveNext())         {             // simple sorted linear insert             var value = orderFunc(next.Current);             var ii = 0;             for ( ; ii < items.Count; ++ii)             {                 if (value.CompareTo(orderFunc(items[ii].Current)) <= 0)                 {                     items.Insert(ii, next);                     break;                 }             }              if (ii == items.Count) items.Add(next);         }         else next.Dispose(); // woops! can't forget IDisposable     } } 

Results:

for (int p = 0; p < people.Count; ++p) {     Console.WriteLine("List {0}:", p + 1);     Console.WriteLine("\t{0}", String.Join(", ", people[p].Select(x => x.Name))); }  Console.WriteLine("Merged:"); foreach (var person in people.MergePreserveOrder(pp => pp.Age)) {     Console.WriteLine("\t{0}", person.Name); }  List 1:         8yo, 22yo, 47yo, 49yo List 2:         35yo, 47yo, 60yo List 3:         28yo, 55yo, 64yo Merged:         8yo         22yo         28yo         35yo         47yo         47yo         49yo         55yo         60yo         64yo 

Improved with .Net 4.0's Tuple support:

public static IEnumerable<T> MergePreserveOrder4<T, TOrder>(     this IEnumerable<IEnumerable<T>> aa,     Func<T, TOrder> orderFunc) where TOrder : IComparable<TOrder> {     var items = aa.Select(xx => xx.GetEnumerator())                   .Where(ee => ee.MoveNext())                   .Select(ee => Tuple.Create(orderFunc(ee.Current), ee))                   .OrderBy(ee => ee.Item1).ToList();      while (items.Count > 0)     {         yield return items[0].Item2.Current;          var next = items[0];         items.RemoveAt(0);         if (next.Item2.MoveNext())         {             var value = orderFunc(next.Item2.Current);             var ii = 0;             for (; ii < items.Count; ++ii)             {                 if (value.CompareTo(items[ii].Item1) <= 0)                 {   // NB: using a tuple to minimize calls to orderFunc                     items.Insert(ii, Tuple.Create(value, next.Item2));                     break;                 }             }              if (ii == items.Count) items.Add(Tuple.Create(value, next.Item2));         }         else next.Item2.Dispose(); // woops! can't forget IDisposable     } } 
like image 51
user7116 Avatar answered Sep 21 '22 17:09

user7116


One guess I would make that might improve clarity and performance is this:

  • Create a priority queue over pairs of T, IEnumerable<T> ordered according to your comparison function on T
  • For each IEnumerable<T> being merged, add the item to the priority queue annotated with a reference to the IEnumerable<T> where it originated
  • While the priority queue is not empty
    • Extract the minimum element from the priority queue
    • Advance the IEnumerable<T> in its annotation to the next element
    • If MoveNext() returned true, add the next element to the priority queue annotated with a reference to the IEnumerable<T> you just advanced
    • If MoveNext() returned false, don't add anything to the priority queue
    • Yield the dequeued element
like image 35
Doug McClean Avatar answered Sep 22 '22 17:09

Doug McClean