Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to subtract one huge list from another efficiently in C#

Tags:

I have a very long list of Ids (integers) that represents all the items that are currently in my database:

var idList = GetAllIds(); 

I also have another huge generic list with items to add to the database:

List<T> itemsToAdd; 

Now, I would like to remove all items from the generic list whose Id is already in the idList. Currently idList is a simple array and I subtract the lists like this:

itemsToAdd.RemoveAll(e => idList.Contains(e.Id)); 

I am pretty sure that it could be a lot faster, so what datatypes should I use for both collections and what is the most efficient practice to subtract them?

Thank you!

like image 380
Shackles Avatar asked Feb 23 '11 14:02

Shackles


1 Answers

LINQ could help:

itemsToAdd.Except(idList) 

Your code is slow because List<T>.Contains is O(n). So your total cost is O(itemsToAdd.Count*idList.Count).

You can make idList into a HashSet<T> which has O(1) .Contains. Or just use the Linq .Except extension method which does it for you.

Note that .Except will also remove all duplicates from the left side. i.e. new int[]{1,1,2}.Except(new int[]{2}) will result in just {1} and the second 1 was removed. But I assume it's no problem in your case because IDs are typically unique.

like image 86
CodesInChaos Avatar answered Sep 18 '22 15:09

CodesInChaos