When processing large amounts of data I often find myself doing the following:
HashSet<String> set = new HashSet<String> (); //Adding elements to the set ArrayList<String> list = new ArrayList<String> (set);
Something like "dumping" the contents of the set in the list. I usually do this since the elements I add often contain duplicates I want to remove, and this seems like an easy way to remove them.
With only that objective in mind (avoiding duplicates) I could also write:
ArrayList<String> list = new ArrayList<String> (); // Processing here if (! list.contains(element)) list.add(element); //More processing here
And thus no need for "dumping" the set into the list. However, I'd be doing a small check before inserting each element (which I'm assuming HashSet does as well)
Is any of the two possibilities clearly more efficient?
As a conclusion, we can learn, that the contains() method works faster in HashSet compared to an ArrayList.
ArrayList allows duplicate values in its collection. On other hand duplicate elements are not allowed in Hashset. HashSet is completely based on object also it doesn't provide get() method. Any number of null value can be inserted in arraylist without any restriction.
The result clearly shows that the HashSet provides faster lookup for the element than the List. This is because of no duplicate data in the HashSet. The HashSet maintains the Hash for each item in it and arranges these in separate buckets containing hash for each character of item stored in HashSet.
My experiment shows that HashSet is faster than an ArrayList starting at collections of 3 elements inclusively.
The set will give much better performance (O(n)
vs O(n^2)
for the list), and that's normal because set membership (the contains
operation) is the very purpose of a set.
Contains for a HashSet
is O(1)
compared to O(n)
for a list, therefore you should never use a list if you often need to run contains
.
The ArrayList
uses an array for storing the data. The ArrayList.contains
will be of O(n) complexity. So essentially searching in array again and again will have O(n^2)
complexity.
While HashSet
uses hashing mechanism for storing the elements into their respective buckets. The operation of HashSet
will be faster for long list of values. It will reach the element in O(1)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With