Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HashSet vs ArrayList contains performance

When processing large amounts of data I often find myself doing the following:

HashSet<String> set = new HashSet<String> (); //Adding elements to the set ArrayList<String> list = new ArrayList<String> (set); 

Something like "dumping" the contents of the set in the list. I usually do this since the elements I add often contain duplicates I want to remove, and this seems like an easy way to remove them.

With only that objective in mind (avoiding duplicates) I could also write:

ArrayList<String> list = new ArrayList<String> (); // Processing here if (! list.contains(element)) list.add(element); //More processing here 

And thus no need for "dumping" the set into the list. However, I'd be doing a small check before inserting each element (which I'm assuming HashSet does as well)

Is any of the two possibilities clearly more efficient?

like image 507
Jorge Avatar asked Sep 13 '15 17:09

Jorge


People also ask

Is HashSet faster than ArrayList?

As a conclusion, we can learn, that the contains() method works faster in HashSet compared to an ArrayList.

Is HashSet or ArrayList better?

ArrayList allows duplicate values in its collection. On other hand duplicate elements are not allowed in Hashset. HashSet is completely based on object also it doesn't provide get() method. Any number of null value can be inserted in arraylist without any restriction.

Which is faster HashSet or list?

The result clearly shows that the HashSet provides faster lookup for the element than the List. This is because of no duplicate data in the HashSet. The HashSet maintains the Hash for each item in it and arranges these in separate buckets containing hash for each character of item stored in HashSet.

Is Set faster than ArrayList Java?

My experiment shows that HashSet is faster than an ArrayList starting at collections of 3 elements inclusively.


2 Answers

The set will give much better performance (O(n) vs O(n^2) for the list), and that's normal because set membership (the contains operation) is the very purpose of a set.

Contains for a HashSet is O(1) compared to O(n) for a list, therefore you should never use a list if you often need to run contains.

like image 175
Dici Avatar answered Sep 21 '22 12:09

Dici


The ArrayList uses an array for storing the data. The ArrayList.contains will be of O(n) complexity. So essentially searching in array again and again will have O(n^2) complexity.

While HashSet uses hashing mechanism for storing the elements into their respective buckets. The operation of HashSet will be faster for long list of values. It will reach the element in O(1).

like image 26
YoungHobbit Avatar answered Sep 23 '22 12:09

YoungHobbit