Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# fastest union of 2 sets of sorted values

What is the fastest way to union 2 sets of sorted values? Speed (big-O) is important here; not clarity - assume this is being done millions of times.

Assume you do not know the type or range of the values, but have an efficent IComparer<T> and/or IEqualityComparer<T>.

Given the following set of numbers:

var la = new int[] { 1, 2, 4, 5, 9 };
var ra = new int[] { 3, 4, 5, 6, 6, 7, 8 };

I am expecting 1, 2, 3, 4, 5, 6, 7, 8, 9. The following stub may be used to test the code:

static void Main(string[] args)
{
    var la = new int[] { 1, 2, 4, 5, 9 };
    var ra = new int[] { 3, 4, 5, 6, 6, 7, 8 };

    foreach (var item in UnionSorted(la, ra, Int32Comparer.Default))
    {
        Console.Write("{0}, ", item);
    }
    Console.ReadLine();
}

class Int32Comparer : IComparer<Int32>
{
    public static readonly Int32Comparer Default = new Int32Comparer();
    public int Compare(int x, int y)
    {
        if (x < y)
            return -1;
        else if (x > y)
            return 1;
        else
            return 0;
    }
}

static IEnumerable<T> UnionSorted<T>(IEnumerable<T> sortedLeft, IEnumerable<T> sortedRight, IComparer<T> comparer)
{
}
like image 991
Jonathan Dickinson Avatar asked Aug 23 '11 17:08

Jonathan Dickinson


2 Answers

The following method returns the correct results:

static IEnumerable<T> UnionSorted<T>(IEnumerable<T> sortedLeft, IEnumerable<T> sortedRight, IComparer<T> comparer)
{
    var first = true;

    var continueLeft = true;
    var continueRight = true;

    T left = default(T);
    T right = default(T);

    using (var el = sortedLeft.GetEnumerator())
    using (var er = sortedRight.GetEnumerator())
    {
        // Loop until both enumeration are done.
        while (continueLeft | continueRight)
        {
            // Only if both enumerations have values.
            if (continueLeft & continueRight)
            {
                    // Seed the enumeration.
                    if (first)
                    {
                        continueLeft = el.MoveNext();
                        if (continueLeft)
                        {
                            left = el.Current;
                        }
                        else 
                        {
                            // left is empty, just dump the right enumerable
                            while (er.MoveNext())
                                yield return er.Current;
                            yield break;
                        }

                        continueRight = er.MoveNext();
                        if (continueRight)
                        {
                            right = er.Current;
                        }
                        else
                        {
                            // right is empty, just dump the left enumerable
                            if (continueLeft)
                            {
                                // there was a value when it was read earlier, let's return it before continuing
                                do
                                {
                                    yield return el.Current;
                                }
                                while (el.MoveNext());
                            } // if continueLeft is false, then both enumerable are empty here.
                            yield break;
                        }

                        first = false;
                    }

                // Compare them and decide which to return.
                var comp = comparer.Compare(left, right);
                if (comp < 0)
                {
                    yield return left;
                    // We only advance left until they match.
                    continueLeft = el.MoveNext();
                    if (continueLeft)
                        left = el.Current;
                }
                else if (comp > 0)
                {
                    yield return right;
                    continueRight = er.MoveNext();
                    if (continueRight)
                        right = er.Current;
                }
                else
                {
                    // The both match, so advance both.
                    yield return left;
                    continueLeft = el.MoveNext();
                    if (continueLeft)
                        left = el.Current;
                    continueRight = er.MoveNext();
                    if (continueRight)
                        right = er.Current;
                }
            }
            // One of the lists is done, don't advance it.
            else if (continueLeft)
            {
                yield return left;
                continueLeft = el.MoveNext();
                if (continueLeft)
                    left = el.Current;
            }
            else if (continueRight)
            {
                yield return right;
                continueRight = er.MoveNext();
                if (continueRight)
                    right = er.Current;
            }
        }
    }
}

The space is ~O(6) and time ~O(max(n,m)) (where m is the second set).

like image 94
Jonathan Dickinson Avatar answered Sep 27 '22 17:09

Jonathan Dickinson


This will make your UnionSorted function a little less versatile, but you can make a small improvement by making an assumption about types. If you do the comparison inside the loop itself (rather than calling the Int32Comparer) then that'll save on some function call overhead.

So your UnionSorted declaration becomes this...

static IEnumerable<int> UnionSorted(IEnumerable<int> sortedLeft, IEnumerable<int> sortedRight)

And then you do this inside the loop, getting rid of the call to comparer.Compare()...

//var comp = comparer.Compare(left, right); // too slow

int comp = 0;
if (left < right)
    comp = -1;
else if (left > right)
    comp = 1;

In my testing this was about 15% faster.

like image 28
Steve Wortham Avatar answered Sep 27 '22 18:09

Steve Wortham