Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Except() on EntityCollection vs List

Using EntityFramework 4. I have an EntityCollection<SomeEntity> currentEntities with ~500k entities and a List<SomeEntity> importedEntities also with ~500k records. I want to have the list of all records occuring in currentEntities which don't exist in importedEntities.

Calling currentEntities.Select(x => x.ID).Except(importedEntities.Select(x => x.ID)) to get unique IDs of occurring records causes a System.OutOfMemoryException because it apparently loads all of the entities into the memory.

Calling currentEntities.Where(x => !importedEntities.Any(y => y.ID == x.ID)) fails with NotSupportedException ("Only primitive types ('such as Int32, String, and Guid') are supported in this context").

currentEntities is on the SQL Server 2008 R2 database, while importedEntities are in memory.

Is this even possible in L2E ?

like image 834
Dejan Janjušević Avatar asked Apr 14 '26 14:04

Dejan Janjušević


1 Answers

Doing an except on two "large" lists of integers in memory is not a point. If I do Enumerable.Range(0, 500000).Except(Enumerable.Range(500, 500000)).Count() it returns 500 before I can say OutOfMemoryException. So I think reducing your problem to an except of integers should work for you:

var newIds = importedEntities.Select(x => x.ID).ToArray()
             .Except(currentEntities.Select(x => x.ID).ToArray()).ToArray();

Thus, only integers will be loaded into memory, no entity objects.

Now you can do:

importedEntities.Where(x => newIds.Contains(x.ID))

provided that newIds is not too long. What is too long? The linq statement produces an IN clause, which can easily contain a couple of thousands of items, but if it is longer than, say, 10,000 you probably should process the IDs in chunks.

By the way (1). I'm assuming here that both lists are in different contexts, maybe even different databases. However, if they are in the same context, you might have success with:

importedEntities.Where(x => !currentEntities.Select(y => y.ID)
                           .Any(id => id == x.ID))

This produces a NOT EXISTS sql query. You may still run into an OutOfMemoryException though if there are "many" new items. If so, you could use a paging mechanism (Skip - Take) to process the new items in chunks.

By the way (2), I swapped currentEntities and importedEntities, because I assume you are interested in the new imported items, please undo that if I'm wrong.

like image 62
Gert Arnold Avatar answered Apr 17 '26 02:04

Gert Arnold



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!