I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are (and the number won't change), and was wondering whether Array would have a notable performance advantage over HashSet for storing/retrieving said elements.
On paper, Array guarantees constant time insertion (since I know the size ahead of time) and retrieval, but the code for HashSet looks much cleaner and adds some flexibility, so I'm wondering if I'm losing anything performance-wise using it, at least, theoretically.
ArrayList allows duplicate values while HashSet doesn't allow duplicates values. Ordering : ArrayList maintains the order of the object in which they are inserted while HashSet is an unordered collection and doesn't maintain any order.
The result clearly shows that the HashSet provides faster lookup for the element than the List. This is because of no duplicate data in the HashSet. The HashSet maintains the Hash for each item in it and arranges these in separate buckets containing hash for each character of item stored in HashSet.
HashSet becomes faster for 10% only if we List is without specified capacity and checks each value before adding through whole list. If items count reduced to 4 then List again wins even in worst scenario (with 10% difference).
Simply put, HashSet is faster than the TreeSet.
Depends on your data;
HashSet
gives you an O(1)
contains() method but doesn't preserve order.
ArrayList
contains() is O(n)
but you can control the order of the entries.
Array
if you need to insert anything in between, worst case can be O(n), since you will have to move the data down and make room for the insertion. In Set
, you can directly use SortedSet which too has O(n) too but with flexible operations.
I believe Set is more flexible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With