How to compare two sorted large lists efficiently in C#?

Question

I have got two generic lists with 20,000 and 30,000 objects in each list.

class Employee
{
    string name;
    double salary;
}

List<Employee> newEmployeeList = List<Employee>() {....} // contains 20,000 objects
List<Employee> oldEmployeeList = List<Employee>() {....} // contains 30,000 objects

Lists can also be sorted by name if it improves the speed.

I want to compare these two lists to find out

employees whose name and salary matching
employees whose name is matching but not salary

What is the fastest way to compare such large data lists with above conditions?

Elalfer · Accepted Answer

I would sort both newEmployeeList and oldEmployeeList lists by name - O(n*log(n)). And then you can use linear algorithm to search for matches. So the total would be O(n+n*log(n)) if both lists are about the same size. This should be faster than O(n^2) "brute force" algorithm.

James Michael Hare · Answer

I'd probably recommend the two lists be stored in a Dictionary<string, Employee> based on the name to begin with, then you can iterate over the keys in one and lookup to see if they exist and the salaries match in the other. This would also save the cost of sorting them later or putting them in a more efficient structure.

This is pretty much O(n) - linear to build both dictionaries, linear to go through the keys and lookup in the other. Since O(n + m + n) reduces to O(n)

But, if you must use List<T> to hold the lists for other reasons, you could also use the Join() LINQ method, and build a new list with a Match field that tells you whether they were a match or mismatch...

        var results = newEmpList.Join(
            oldEmpList,
            n => n.Name,
            o => o.Name,
            (n, o) => new 
                { 
                    Name = n.Name, 
                    Salary = n.Salary, 
                    Match = o.Salary == n.Salary 
                });

You can then filter this with a Where() clause for Match or !Match.

Scott Rippey · Answer

Update: I assume (by the title of your question) that the 2 lists are already sorted. Perhaps they're stored in a database with a clustered index or something. This answer, therefore, relies on that assumption.

Here is an implementation that has O(n) complexity, and is also very fast, AND is pretty simple too.
I believe this is a variant of the Merge Algorithm.

Here's the idea:

Start enumerating both lists
Compare the 2 current items.
If they match, add to your results.
If the 1st item is "smaller", advance the 1st list.
If the 2nd item is "smaller", advance the 2nd list.

Since both lists are known to be sorted, this will work very well. This implementation assumes that name is unique in each list.

var comparer = StringComparer.OrdinalIgnoreCase;
var namesAndSalaries = new List<Tuple<Employee, Employee>>();
var namesOnly = new List<Tuple<Employee, Employee>>();

// Create 2 iterators; one for old, one for new:
using (IEnumerator<Employee> A = oldEmployeeList.GetEnumerator()) {
    using (IEnumerator<Employee> B = newEmployeeList.GetEnumerator()) {
        // Start enumerating both:
        if (A.MoveNext() && B.MoveNext()) {
            while (true) {
                int compared = comparer.Compare(A.Current.name, B.Current.name);
                if (compared == 0) {
                    // Names match
                    if (A.Current.salary == B.Current.salary) {
                        namesAndSalaries.Add(Tuple.Create(A.Current, B.Current));
                    } else {
                        namesOnly.Add(Tuple.Create(A.Current, B.Current));
                    }
                    if (!A.MoveNext() || !B.MoveNext()) break;
                } else if (compared == -1) {
                    // Keep searching A
                    if (!A.MoveNext()) break;
                } else {
                    // Keep searching B
                    if (!B.MoveNext()) break;
                }

            }
        }
    }
}

Tigran · Answer

One of fastest possible solutions on sorted lists is use of BinarySearch in order to find an item in another list.

But as mantioned others, you should measure it against your project requirements, as performance often tends to be a subjective thing.

Joachim Isaksson · Answer

You could create a Dictionary using

var lookupDictionary = list1.ToDictionary(x=>x.name);

That would give you close to O(1) lookup and a close to O(n) behavior if you're looking up values from a loop over the other list.

(I'm assuming here that ToDictionary is O(n) which would make sense with a straight forward implementation, but I have not tested this to be the case)

This would make for a very straight forward algorithm, and I'm thinking going below O(n) with two unsorted lists is pretty hard.

How to compare two sorted large lists efficiently in C#?

Tags:

c#

.net

Santosh

5 Answers

Elalfer

James Michael Hare

Scott Rippey

Tigran

Joachim Isaksson

Recent Activity

Donate For Us

How to compare two sorted large lists efficiently in C#?

Tags:

c#

.net

Santosh

5 Answers

Elalfer

James Michael Hare

Scott Rippey

Tigran

Joachim Isaksson

Related questions

Recent Activity

Donate For Us