Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linq 'contains' query taking too long

Tags:

asp.net

linq

I have this query:

var newComponents = from ic in importedComponents
                    where !existingComponents.Contains(ic)
                    select ic;

importedComponents and existingComponents are of type List<ImportedComponent>, and exist only in memory (are not tied to a data context). In this instance, importedComponents has just over 6,100 items, and existingComponents has 511 items.

This statement is taking too long to complete (I don't know how long, I stop the script after 20 minutes). I've tried the following with no improvement in execution speed:

var existingComponentIDs = from ec in existingComponents
                           select ec.ID;

var newComponents = from ic in importedComponents
                    where !existingComponentIDs.Contains(ic.ID)
                    select ic;

Any help will be much appreciated.

like image 808
Albert Bori Avatar asked Apr 20 '12 23:04

Albert Bori


2 Answers

The problem is quadratic complexity of this algorithm. Put the IDs of all existingComponentIDs into a HashSet and use the HashSet.Contains method. It has O(1) lookup cost compared to O(N) for Contains/Any on a list.

The morelinq project contains a method that does all of that in one convenient step: ExceptBy.

like image 175
usr Avatar answered Oct 09 '22 11:10

usr


You could use Except to get the set difference:

var existingComponentIDs = existingComponents.Select(c => c.ID); 
var importedComponentIDs = importedComponents.Select(c => c.ID);
var newComponentIDs = importedComponentIDs.Except(existingComponentIDs);
var newComponents = from ic in importedComponents
        join newID in newComponentIDs on ic.ID equals newID
        select ic;
foreach (var c in newComponents)
{ 
    // insert into database?
}

Why is LINQ JOIN so much faster than linking with WHERE?

In short: Join method can set up a hash table to use as an index to quicky zip two tables together

like image 30
Tim Schmelter Avatar answered Oct 09 '22 10:10

Tim Schmelter