Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare two sorted large lists efficiently in C#?

Tags:

c#

.net

I have got two generic lists with 20,000 and 30,000 objects in each list.

class Employee
{
    string name;
    double salary;
}

List<Employee> newEmployeeList = List<Employee>() {....} // contains 20,000 objects
List<Employee> oldEmployeeList = List<Employee>() {....} // contains 30,000 objects

Lists can also be sorted by name if it improves the speed.

I want to compare these two lists to find out

  1. employees whose name and salary matching
  2. employees whose name is matching but not salary

What is the fastest way to compare such large data lists with above conditions?

like image 756
Santosh Avatar asked Jan 09 '12 22:01

Santosh


5 Answers

I would sort both newEmployeeList and oldEmployeeList lists by name - O(n*log(n)). And then you can use linear algorithm to search for matches. So the total would be O(n+n*log(n)) if both lists are about the same size. This should be faster than O(n^2) "brute force" algorithm.

like image 188
Elalfer Avatar answered Oct 20 '22 17:10

Elalfer


I'd probably recommend the two lists be stored in a Dictionary<string, Employee> based on the name to begin with, then you can iterate over the keys in one and lookup to see if they exist and the salaries match in the other. This would also save the cost of sorting them later or putting them in a more efficient structure.

This is pretty much O(n) - linear to build both dictionaries, linear to go through the keys and lookup in the other. Since O(n + m + n) reduces to O(n)

But, if you must use List<T> to hold the lists for other reasons, you could also use the Join() LINQ method, and build a new list with a Match field that tells you whether they were a match or mismatch...

        var results = newEmpList.Join(
            oldEmpList,
            n => n.Name,
            o => o.Name,
            (n, o) => new 
                { 
                    Name = n.Name, 
                    Salary = n.Salary, 
                    Match = o.Salary == n.Salary 
                });

You can then filter this with a Where() clause for Match or !Match.

like image 23
James Michael Hare Avatar answered Oct 20 '22 18:10

James Michael Hare


Update: I assume (by the title of your question) that the 2 lists are already sorted. Perhaps they're stored in a database with a clustered index or something. This answer, therefore, relies on that assumption.

Here is an implementation that has O(n) complexity, and is also very fast, AND is pretty simple too.
I believe this is a variant of the Merge Algorithm.

Here's the idea:

  1. Start enumerating both lists
  2. Compare the 2 current items.
  3. If they match, add to your results.
    If the 1st item is "smaller", advance the 1st list.
    If the 2nd item is "smaller", advance the 2nd list.

Since both lists are known to be sorted, this will work very well. This implementation assumes that name is unique in each list.

var comparer = StringComparer.OrdinalIgnoreCase;
var namesAndSalaries = new List<Tuple<Employee, Employee>>();
var namesOnly = new List<Tuple<Employee, Employee>>();

// Create 2 iterators; one for old, one for new:
using (IEnumerator<Employee> A = oldEmployeeList.GetEnumerator()) {
    using (IEnumerator<Employee> B = newEmployeeList.GetEnumerator()) {
        // Start enumerating both:
        if (A.MoveNext() && B.MoveNext()) {
            while (true) {
                int compared = comparer.Compare(A.Current.name, B.Current.name);
                if (compared == 0) {
                    // Names match
                    if (A.Current.salary == B.Current.salary) {
                        namesAndSalaries.Add(Tuple.Create(A.Current, B.Current));
                    } else {
                        namesOnly.Add(Tuple.Create(A.Current, B.Current));
                    }
                    if (!A.MoveNext() || !B.MoveNext()) break;
                } else if (compared == -1) {
                    // Keep searching A
                    if (!A.MoveNext()) break;
                } else {
                    // Keep searching B
                    if (!B.MoveNext()) break;
                }

            }
        }
    }
}
like image 26
Scott Rippey Avatar answered Oct 20 '22 16:10

Scott Rippey


One of fastest possible solutions on sorted lists is use of BinarySearch in order to find an item in another list.

But as mantioned others, you should measure it against your project requirements, as performance often tends to be a subjective thing.

like image 27
Tigran Avatar answered Oct 20 '22 17:10

Tigran


You could create a Dictionary using

var lookupDictionary = list1.ToDictionary(x=>x.name);

That would give you close to O(1) lookup and a close to O(n) behavior if you're looking up values from a loop over the other list.

(I'm assuming here that ToDictionary is O(n) which would make sense with a straight forward implementation, but I have not tested this to be the case)

This would make for a very straight forward algorithm, and I'm thinking going below O(n) with two unsorted lists is pretty hard.

like image 1
Joachim Isaksson Avatar answered Oct 20 '22 18:10

Joachim Isaksson