Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List<T>.Contains() is very slow?

Could anyone explain me why the generics List.Contains() function is so slow?

I have a List<long> with about a million numbers, and the code that is constantly checking if there's a specific number within these numbers.

I tried doing the same thing using Dictionary<long, byte> and the Dictionary.ContainsKey() function, and it was about 10-20 times faster than with the List.

Of course, I don't really want to use Dictionary for that purpose, because it wasn't meant to be used that way.

So, the real question here is, is there any alternative to the List<T>.Contains(), but not as whacky as Dictionary<K,V>.ContainsKey() ?

like image 294
DSent Avatar asked May 05 '09 08:05

DSent


2 Answers

If you are just checking for existence, HashSet<T> in .NET 3.5 is your best option - dictionary-like performance, but no key/value pair - just the values:

    HashSet<int> data = new HashSet<int>();     for (int i = 0; i < 1000000; i++)     {         data.Add(rand.Next(50000000));     }     bool contains = data.Contains(1234567); // etc 
like image 200
Marc Gravell Avatar answered Oct 15 '22 11:10

Marc Gravell


List.Contains is a O(n) operation.

Dictionary.ContainsKey is a O(1) operation, since it uses the hashcode of the objects as a key, which gives you a quicker search ability.

I don't think that it 's a good idea to scan through a List which contains a million entries to find a few entries.

Isn't it possible to save those millon entities into a RDBMS for instance, and perform queries on that database ?

If it is not possible, then I would use a Dictionary anyway if you want to do key-lookups.

like image 35
Frederik Gheysels Avatar answered Oct 15 '22 11:10

Frederik Gheysels