I have a very long list of Ids (integers) that represents all the items that are currently in my database:
var idList = GetAllIds();
I also have another huge generic list with items to add to the database:
List<T> itemsToAdd;
Now, I would like to remove all items from the generic list whose Id is already in the idList. Currently idList is a simple array and I subtract the lists like this:
itemsToAdd.RemoveAll(e => idList.Contains(e.Id));
I am pretty sure that it could be a lot faster, so what datatypes should I use for both collections and what is the most efficient practice to subtract them?
Thank you!
LINQ could help:
itemsToAdd.Except(idList)
Your code is slow because List<T>.Contains
is O(n)
. So your total cost is O(itemsToAdd.Count*idList.Count)
.
You can make idList into a HashSet<T>
which has O(1)
.Contains
. Or just use the Linq .Except
extension method which does it for you.
Note that .Except
will also remove all duplicates from the left side. i.e. new int[]{1,1,2}.Except(new int[]{2})
will result in just {1}
and the second 1 was removed. But I assume it's no problem in your case because IDs are typically unique.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With