It's clear that a search performance of the generic HashSet<T>
class is higher than of the generic List<T>
class. Just compare the hash-based key with the linear approach in the List<T>
class.
However calculating a hash key may itself take some CPU cycles, so for a small amount of items the linear search can be a real alternative to the HashSet<T>
.
My question: where is the break-even?
To simplify the scenario (and to be fair) let's assume that the List<T>
class uses the element's Equals()
method to identify an item.
The result clearly shows that the HashSet provides faster lookup for the element than the List. This is because of no duplicate data in the HashSet. The HashSet maintains the Hash for each item in it and arranges these in separate buckets containing hash for each character of item stored in HashSet.
As a conclusion, we can learn, that the contains() method works faster in HashSet compared to an ArrayList.
Show activity on this post. It's clear that a search performance of the generic HashSet<T> class is higher than of the generic List<T> class. Just compare the hash-based key with the linear approach in the List<T> class.
HashSet is faster than List collection because it uses hashing for storage. HashSet is a superb choice for storing unique elements because it can quickly compare hash codes to group similar items together.
A lot of people are saying that once you get to the size where speed is actually a concern that HashSet<T>
will always beat List<T>
, but that depends on what you are doing.
Let's say you have a List<T>
that will only ever have on average 5 items in it. Over a large number of cycles, if a single item is added or removed each cycle, you may well be better off using a List<T>
.
I did a test for this on my machine, and, well, it has to be very very small to get an advantage from List<T>
. For a list of short strings, the advantage went away after size 5, for objects after size 20.
1 item LIST strs time: 617ms 1 item HASHSET strs time: 1332ms 2 item LIST strs time: 781ms 2 item HASHSET strs time: 1354ms 3 item LIST strs time: 950ms 3 item HASHSET strs time: 1405ms 4 item LIST strs time: 1126ms 4 item HASHSET strs time: 1441ms 5 item LIST strs time: 1370ms 5 item HASHSET strs time: 1452ms 6 item LIST strs time: 1481ms 6 item HASHSET strs time: 1418ms 7 item LIST strs time: 1581ms 7 item HASHSET strs time: 1464ms 8 item LIST strs time: 1726ms 8 item HASHSET strs time: 1398ms 9 item LIST strs time: 1901ms 9 item HASHSET strs time: 1433ms 1 item LIST objs time: 614ms 1 item HASHSET objs time: 1993ms 4 item LIST objs time: 837ms 4 item HASHSET objs time: 1914ms 7 item LIST objs time: 1070ms 7 item HASHSET objs time: 1900ms 10 item LIST objs time: 1267ms 10 item HASHSET objs time: 1904ms 13 item LIST objs time: 1494ms 13 item HASHSET objs time: 1893ms 16 item LIST objs time: 1695ms 16 item HASHSET objs time: 1879ms 19 item LIST objs time: 1902ms 19 item HASHSET objs time: 1950ms 22 item LIST objs time: 2136ms 22 item HASHSET objs time: 1893ms 25 item LIST objs time: 2357ms 25 item HASHSET objs time: 1826ms 28 item LIST objs time: 2555ms 28 item HASHSET objs time: 1865ms 31 item LIST objs time: 2755ms 31 item HASHSET objs time: 1963ms 34 item LIST objs time: 3025ms 34 item HASHSET objs time: 1874ms 37 item LIST objs time: 3195ms 37 item HASHSET objs time: 1958ms 40 item LIST objs time: 3401ms 40 item HASHSET objs time: 1855ms 43 item LIST objs time: 3618ms 43 item HASHSET objs time: 1869ms 46 item LIST objs time: 3883ms 46 item HASHSET objs time: 2046ms 49 item LIST objs time: 4218ms 49 item HASHSET objs time: 1873ms
Here is that data displayed as a graph:
Here's the code:
static void Main(string[] args) { int times = 10000000; for (int listSize = 1; listSize < 10; listSize++) { List<string> list = new List<string>(); HashSet<string> hashset = new HashSet<string>(); for (int i = 0; i < listSize; i++) { list.Add("string" + i.ToString()); hashset.Add("string" + i.ToString()); } Stopwatch timer = new Stopwatch(); timer.Start(); for (int i = 0; i < times; i++) { list.Remove("string0"); list.Add("string0"); } timer.Stop(); Console.WriteLine(listSize.ToString() + " item LIST strs time: " + timer.ElapsedMilliseconds.ToString() + "ms"); timer = new Stopwatch(); timer.Start(); for (int i = 0; i < times; i++) { hashset.Remove("string0"); hashset.Add("string0"); } timer.Stop(); Console.WriteLine(listSize.ToString() + " item HASHSET strs time: " + timer.ElapsedMilliseconds.ToString() + "ms"); Console.WriteLine(); } for (int listSize = 1; listSize < 50; listSize+=3) { List<object> list = new List<object>(); HashSet<object> hashset = new HashSet<object>(); for (int i = 0; i < listSize; i++) { list.Add(new object()); hashset.Add(new object()); } object objToAddRem = list[0]; Stopwatch timer = new Stopwatch(); timer.Start(); for (int i = 0; i < times; i++) { list.Remove(objToAddRem); list.Add(objToAddRem); } timer.Stop(); Console.WriteLine(listSize.ToString() + " item LIST objs time: " + timer.ElapsedMilliseconds.ToString() + "ms"); timer = new Stopwatch(); timer.Start(); for (int i = 0; i < times; i++) { hashset.Remove(objToAddRem); hashset.Add(objToAddRem); } timer.Stop(); Console.WriteLine(listSize.ToString() + " item HASHSET objs time: " + timer.ElapsedMilliseconds.ToString() + "ms"); Console.WriteLine(); } Console.ReadLine(); }
It's essentially pointless to compare two structures for performance that behave differently. Use the structure that conveys the intent. Even if you say your List<T>
wouldn't have duplicates and iteration order doesn't matter making it comparable to a HashSet<T>
, its still a poor choice to use List<T>
because its relatively less fault tolerant.
That said, I will inspect some other aspects of performance,
+------------+--------+-------------+-----------+----------+----------+-----------+ | Collection | Random | Containment | Insertion | Addition | Removal | Memory | | | access | | | | | | +------------+--------+-------------+-----------+----------+----------+-----------+ | List<T> | O(1) | O(n) | O(n) | O(1)* | O(n) | Lesser | | HashSet<T> | O(n) | O(1) | n/a | O(1) | O(1) | Greater** | +------------+--------+-------------+-----------+----------+----------+-----------+
Even though addition is O(1) in both cases, it will be relatively slower in HashSet since it involves cost of precomputing hash code before storing it.
The superior scalability of HashSet has a memory cost. Every entry is stored as a new object along with its hash code. This article might give you an idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With