Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to most efficiently test if two arrays contain equivalent items in C#

I have two arrays and I want to know if they contain the same items. Equals(object obj) doesn't work because an array is a reference type. I have posted my attempt below, but since I'm sure this is a common task I'd like to know if there is a better test.

    public bool ContainsEquivalentSequence<T>(T[] array1, T[] array2)
    {
        bool a1IsNullOrEmpty = ReferenceEquals(array1, null) || array1.Length == 0;
        bool a2IsNullOrEmpty = ReferenceEquals(array2, null) || array2.Length == 0;
        if (a1IsNullOrEmpty) return a2IsNullOrEmpty;
        if (a2IsNullOrEmpty || array1.Length != array2.Length) return false;
        for (int i = 0; i < array1.Length; i++)
            if (!Equals(array1[i], array2[i]))
                return false;
        return true;
    }

Update - System.Linq.Enumerable.SequenceEqual is not better

I reflected the source and it does not compare the length prior to executing the loop. This makes sense since the method is designed generally for an IEnumerable<T>, not for a T[].

    public static bool SequenceEqual<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
    {
        if (comparer == null)
        {
            comparer = EqualityComparer<TSource>.Default;
        }
        if (first == null)
        {
            throw Error.ArgumentNull("first");
        }
        if (second == null)
        {
            throw Error.ArgumentNull("second");
        }
        using (IEnumerator<TSource> enumerator = first.GetEnumerator())
        {
            using (IEnumerator<TSource> enumerator2 = second.GetEnumerator())
            {
                while (enumerator.MoveNext())
                {
                    if (!enumerator2.MoveNext() || !comparer.Equals(enumerator.Current, enumerator2.Current))
                    {
                        return false;
                    }
                }
                if (enumerator2.MoveNext())
                {
                    return false;
                }
            }
        }
        return true;
    }
like image 965
smartcaveman Avatar asked Aug 16 '11 23:08

smartcaveman


1 Answers

I've done some tests using Any, Contains, All and SequenceEqual then I picked the best 3.

There are different results for different inputs...

Two identical arrays of size 100: SequenceEqual was faster

[     SequenceEqual: 00:00:00.027   ]*
[     ContainsEqSeq: 00:00:00.046   ]
[          Parallel: 00:00:00.281   ]

Two identical arrays of size 1000: SequenceEqual was faster

[     SequenceEqual: 00:00:00.240   ]*
[     ContainsEqSeq: 00:00:00.361   ]
[          Parallel: 00:00:00.491   ]

Two identical arrays of size 10000: Parallel was faster

[     SequenceEqual: 00:00:02.357   ]
[     ContainsEqSeq: 00:00:03.341   ]
[          Parallel: 00:00:01.688   ]*

Two identical arrays of size 50000: Parallel kick ass

[     SequenceEqual: 00:00:11.824   ]
[     ContainsEqSeq: 00:00:17.206   ]
[          Parallel: 00:00:06.811   ]*

Two arrays with one difference at position 200: SequenceEqual was faster

[     SequenceEqual: 00:00:00.050   ]*
[     ContainsEqSeq: 00:00:00.075   ]
[          Parallel: 00:00:00.332   ]

Two arrays with one difference at position 0: ContainsEqSeq and SequenceEqual were faster

[     SequenceEqual: 00:00:00.002   ]*
[     ContainsEqSeq: 00:00:00.001   ]*
[          Parallel: 00:00:00.211   ]

Two arrays with one difference at position 999: SequenceEqual was faster

[     SequenceEqual: 00:00:00.237   ]*
[     ContainsEqSeq: 00:00:00.330   ]
[          Parallel: 00:00:00.691   ]

Two arrays with one difference at position 9999: Parallel kick ass

[     SequenceEqual: 00:00:02.386   ]
[     ContainsEqSeq: 00:00:03.417   ]
[          Parallel: 00:00:01.614   ]*

The code for SequenceEqual is

a1.SequenceEqual(a2)

The code for ContainsEqSeq is your method.

The code for Parallel is

bool a1IsNullOrEmpty = ReferenceEquals(a1, null) || a1.Length == 0;
bool a2IsNullOrEmpty = ReferenceEquals(a2, null) || a2.Length == 0;
if (a1IsNullOrEmpty) return a2IsNullOrEmpty;
if (a2IsNullOrEmpty || a1.Length != a2.Length) return false;

var areEqual = true;
Parallel.ForEach(a1,
    (i, s, x) =>
    {
        if (a1[x] != a2[x])
        {
            areEqual = false;
            s.Stop();
        }
    });

return areEqual;

I would say that the best one depends on what your input will be.

If you will work with huge arrays (like 10000+) I would say Parallel is the best choice, it only loses when there is a difference on the beginning.

For other cases SequenceEqual might be the best one, I only tested with int[], but I believe it can be fast with complex types as well.

But remember, results will vary accordingly to the input.

like image 123
BrunoLM Avatar answered Nov 06 '22 00:11

BrunoLM