Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HashSet vs. List performance

It's clear that a search performance of the generic HashSet<T> class is higher than of the generic List<T> class. Just compare the hash-based key with the linear approach in the List<T> class.

However calculating a hash key may itself take some CPU cycles, so for a small amount of items the linear search can be a real alternative to the HashSet<T>.

My question: where is the break-even?

To simplify the scenario (and to be fair) let's assume that the List<T> class uses the element's Equals() method to identify an item.

like image 597
Michael Damatov Avatar asked Sep 29 '08 21:09

Michael Damatov


People also ask

Is HashSet faster than List?

The result clearly shows that the HashSet provides faster lookup for the element than the List. This is because of no duplicate data in the HashSet. The HashSet maintains the Hash for each item in it and arranges these in separate buckets containing hash for each character of item stored in HashSet.

Is HashSet faster than ArrayList?

As a conclusion, we can learn, that the contains() method works faster in HashSet compared to an ArrayList.

Is HashSet faster than List Java?

Show activity on this post. It's clear that a search performance of the generic HashSet<T> class is higher than of the generic List<T> class. Just compare the hash-based key with the linear approach in the List<T> class.

Is a HashSet fast?

HashSet is faster than List collection because it uses hashing for storage. HashSet is a superb choice for storing unique elements because it can quickly compare hash codes to group similar items together.


2 Answers

A lot of people are saying that once you get to the size where speed is actually a concern that HashSet<T> will always beat List<T>, but that depends on what you are doing.

Let's say you have a List<T> that will only ever have on average 5 items in it. Over a large number of cycles, if a single item is added or removed each cycle, you may well be better off using a List<T>.

I did a test for this on my machine, and, well, it has to be very very small to get an advantage from List<T>. For a list of short strings, the advantage went away after size 5, for objects after size 20.

1 item LIST strs time: 617ms 1 item HASHSET strs time: 1332ms  2 item LIST strs time: 781ms 2 item HASHSET strs time: 1354ms  3 item LIST strs time: 950ms 3 item HASHSET strs time: 1405ms  4 item LIST strs time: 1126ms 4 item HASHSET strs time: 1441ms  5 item LIST strs time: 1370ms 5 item HASHSET strs time: 1452ms  6 item LIST strs time: 1481ms 6 item HASHSET strs time: 1418ms  7 item LIST strs time: 1581ms 7 item HASHSET strs time: 1464ms  8 item LIST strs time: 1726ms 8 item HASHSET strs time: 1398ms  9 item LIST strs time: 1901ms 9 item HASHSET strs time: 1433ms  1 item LIST objs time: 614ms 1 item HASHSET objs time: 1993ms  4 item LIST objs time: 837ms 4 item HASHSET objs time: 1914ms  7 item LIST objs time: 1070ms 7 item HASHSET objs time: 1900ms  10 item LIST objs time: 1267ms 10 item HASHSET objs time: 1904ms  13 item LIST objs time: 1494ms 13 item HASHSET objs time: 1893ms  16 item LIST objs time: 1695ms 16 item HASHSET objs time: 1879ms  19 item LIST objs time: 1902ms 19 item HASHSET objs time: 1950ms  22 item LIST objs time: 2136ms 22 item HASHSET objs time: 1893ms  25 item LIST objs time: 2357ms 25 item HASHSET objs time: 1826ms  28 item LIST objs time: 2555ms 28 item HASHSET objs time: 1865ms  31 item LIST objs time: 2755ms 31 item HASHSET objs time: 1963ms  34 item LIST objs time: 3025ms 34 item HASHSET objs time: 1874ms  37 item LIST objs time: 3195ms 37 item HASHSET objs time: 1958ms  40 item LIST objs time: 3401ms 40 item HASHSET objs time: 1855ms  43 item LIST objs time: 3618ms 43 item HASHSET objs time: 1869ms  46 item LIST objs time: 3883ms 46 item HASHSET objs time: 2046ms  49 item LIST objs time: 4218ms 49 item HASHSET objs time: 1873ms 

Here is that data displayed as a graph:

enter image description here

Here's the code:

static void Main(string[] args) {     int times = 10000000;       for (int listSize = 1; listSize < 10; listSize++)     {         List<string> list = new List<string>();         HashSet<string> hashset = new HashSet<string>();          for (int i = 0; i < listSize; i++)         {             list.Add("string" + i.ToString());             hashset.Add("string" + i.ToString());         }          Stopwatch timer = new Stopwatch();         timer.Start();         for (int i = 0; i < times; i++)         {             list.Remove("string0");             list.Add("string0");         }         timer.Stop();         Console.WriteLine(listSize.ToString() + " item LIST strs time: " + timer.ElapsedMilliseconds.ToString() + "ms");           timer = new Stopwatch();         timer.Start();         for (int i = 0; i < times; i++)         {             hashset.Remove("string0");             hashset.Add("string0");         }         timer.Stop();         Console.WriteLine(listSize.ToString() + " item HASHSET strs time: " + timer.ElapsedMilliseconds.ToString() + "ms");         Console.WriteLine();     }       for (int listSize = 1; listSize < 50; listSize+=3)     {         List<object> list = new List<object>();         HashSet<object> hashset = new HashSet<object>();          for (int i = 0; i < listSize; i++)         {             list.Add(new object());             hashset.Add(new object());         }          object objToAddRem = list[0];          Stopwatch timer = new Stopwatch();         timer.Start();         for (int i = 0; i < times; i++)         {             list.Remove(objToAddRem);             list.Add(objToAddRem);         }         timer.Stop();         Console.WriteLine(listSize.ToString() + " item LIST objs time: " + timer.ElapsedMilliseconds.ToString() + "ms");            timer = new Stopwatch();         timer.Start();         for (int i = 0; i < times; i++)         {             hashset.Remove(objToAddRem);             hashset.Add(objToAddRem);         }         timer.Stop();         Console.WriteLine(listSize.ToString() + " item HASHSET objs time: " + timer.ElapsedMilliseconds.ToString() + "ms");         Console.WriteLine();     }      Console.ReadLine(); } 
like image 94
innominate227 Avatar answered Sep 22 '22 16:09

innominate227


It's essentially pointless to compare two structures for performance that behave differently. Use the structure that conveys the intent. Even if you say your List<T> wouldn't have duplicates and iteration order doesn't matter making it comparable to a HashSet<T>, its still a poor choice to use List<T> because its relatively less fault tolerant.

That said, I will inspect some other aspects of performance,

+------------+--------+-------------+-----------+----------+----------+-----------+ | Collection | Random | Containment | Insertion | Addition |  Removal | Memory    | |            | access |             |           |          |          |           | +------------+--------+-------------+-----------+----------+----------+-----------+ | List<T>    | O(1)   | O(n)        | O(n)      | O(1)*    | O(n)     | Lesser    | | HashSet<T> | O(n)   | O(1)        | n/a       | O(1)     | O(1)     | Greater** | +------------+--------+-------------+-----------+----------+----------+-----------+ 
  • Even though addition is O(1) in both cases, it will be relatively slower in HashSet since it involves cost of precomputing hash code before storing it.

  • The superior scalability of HashSet has a memory cost. Every entry is stored as a new object along with its hash code. This article might give you an idea.

like image 42
nawfal Avatar answered Sep 24 '22 16:09

nawfal