I have several huge sorted enumerable sequences that I want to merge. Theses lists are manipulated as IEnumerable
but are already sorted. Since input lists are sorted, it should be possible to merge them in one trip, without re-sorting anything.
I would like to keep the defered execution behavior.
I tried to write a naive algorithm which do that (see below). However, it looks pretty ugly and I'm sure it can be optimized. It may exist a more academical algorithm...
IEnumerable<T> MergeOrderedLists<T, TOrder>(IEnumerable<IEnumerable<T>> orderedlists, Func<T, TOrder> orderBy) { var enumerators = orderedlists.ToDictionary(l => l.GetEnumerator(), l => default(T)); IEnumerator<T> tag = null; var firstRun = true; while (true) { var toRemove = new List<IEnumerator<T>>(); var toAdd = new List<KeyValuePair<IEnumerator<T>, T>>(); foreach (var pair in enumerators.Where(pair => firstRun || tag == pair.Key)) { if (pair.Key.MoveNext()) toAdd.Add(pair); else toRemove.Add(pair.Key); } foreach (var enumerator in toRemove) enumerators.Remove(enumerator); foreach (var pair in toAdd) enumerators[pair.Key] = pair.Key.Current; if (enumerators.Count == 0) yield break; var min = enumerators.OrderBy(t => orderBy(t.Value)).FirstOrDefault(); tag = min.Key; yield return min.Value; firstRun = false; } }
The method can be used like that:
// Person lists are already sorted by age MergeOrderedLists(orderedList, p => p.Age);
assuming the following Person
class exists somewhere:
public class Person { public int Age { get; set; } }
Duplicates should be conserved, we don't care about their order in the new sequence. Do you see any obvious optimization I could use?
An example of an array that would be the worst case scenario for Merge Sort is [1, 9, 5, 13, 3, 11, 7, 15].. Best case: In the best case scenario, the array is already sorted, but we still have to split and recombine it back.
Merge sort is one of the most efficient sorting algorithms. It works on the principle of Divide and Conquer based on the idea of breaking down a list into several sub-lists until each sublist consists of a single element and merging those sublists in a manner that results into a sorted list.
Best Case Complexity: The merge sort algorithm has a best-case time complexity of O(n*log n) for the already sorted array.
Merge sort is more efficient and works faster than quick sort in case of larger array size or datasets. Quick sort is more efficient and works faster than merge sort in case of smaller array size or datasets. Sorting method : The quick sort is internal sorting method where the data is sorted in main memory.
Here is my fourth (thanks to @tanascius for pushing this along to something much more LINQ) cut at it:
public static IEnumerable<T> MergePreserveOrder3<T, TOrder>( this IEnumerable<IEnumerable<T>> aa, Func<T, TOrder> orderFunc) where TOrder : IComparable<TOrder> { var items = aa.Select(xx => xx.GetEnumerator()).Where(ee => ee.MoveNext()) .OrderBy(ee => orderFunc(ee.Current)).ToList(); while (items.Count > 0) { yield return items[0].Current; var next = items[0]; items.RemoveAt(0); if (next.MoveNext()) { // simple sorted linear insert var value = orderFunc(next.Current); var ii = 0; for ( ; ii < items.Count; ++ii) { if (value.CompareTo(orderFunc(items[ii].Current)) <= 0) { items.Insert(ii, next); break; } } if (ii == items.Count) items.Add(next); } else next.Dispose(); // woops! can't forget IDisposable } }
Results:
for (int p = 0; p < people.Count; ++p) { Console.WriteLine("List {0}:", p + 1); Console.WriteLine("\t{0}", String.Join(", ", people[p].Select(x => x.Name))); } Console.WriteLine("Merged:"); foreach (var person in people.MergePreserveOrder(pp => pp.Age)) { Console.WriteLine("\t{0}", person.Name); } List 1: 8yo, 22yo, 47yo, 49yo List 2: 35yo, 47yo, 60yo List 3: 28yo, 55yo, 64yo Merged: 8yo 22yo 28yo 35yo 47yo 47yo 49yo 55yo 60yo 64yo
Improved with .Net 4.0's Tuple support:
public static IEnumerable<T> MergePreserveOrder4<T, TOrder>( this IEnumerable<IEnumerable<T>> aa, Func<T, TOrder> orderFunc) where TOrder : IComparable<TOrder> { var items = aa.Select(xx => xx.GetEnumerator()) .Where(ee => ee.MoveNext()) .Select(ee => Tuple.Create(orderFunc(ee.Current), ee)) .OrderBy(ee => ee.Item1).ToList(); while (items.Count > 0) { yield return items[0].Item2.Current; var next = items[0]; items.RemoveAt(0); if (next.Item2.MoveNext()) { var value = orderFunc(next.Item2.Current); var ii = 0; for (; ii < items.Count; ++ii) { if (value.CompareTo(items[ii].Item1) <= 0) { // NB: using a tuple to minimize calls to orderFunc items.Insert(ii, Tuple.Create(value, next.Item2)); break; } } if (ii == items.Count) items.Add(Tuple.Create(value, next.Item2)); } else next.Item2.Dispose(); // woops! can't forget IDisposable } }
One guess I would make that might improve clarity and performance is this:
T
, IEnumerable<T>
ordered according to your comparison function on T
IEnumerable<T>
being merged, add the item to the priority queue annotated with a reference to the IEnumerable<T>
where it originatedIEnumerable<T>
in its annotation to the next elementMoveNext()
returned true, add the next element to the priority queue annotated with a reference to the IEnumerable<T>
you just advancedMoveNext()
returned false, don't add anything to the priority queueIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With