I have a gigantic data set which I've to store into a collection and need to find whethere any duplicates in there or not.
The data size could be more than 1 million. I know I can store more element in ArrayList
comapre to a Map
.
My questions are:
Map
faster than searching in sorted ArrayList
?HashMap
is faster than TreeMap
?n
elements, which would be more efficient between a TreeMap
and a HashMap
implementation?1) Yes. Searching an ArrayList
is O(n) on average. The performance of key lookups in a Map depends on the specific implementation. You could write an implementation of Map
that is O(n) or worse if you really wanted to, but all the implementations in the standard library are faster than O(n).
2) Yes. HashMap
is O(1) on average for simple key lookups. TreeMap
is O(log(n)).
Class HashMap<K,V>
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.
Class TreeMap<K,V>
This implementation provides guaranteed log(n) time cost for the containsKey, get, put and remove operations. Algorithms are adaptations of those in Cormen, Leiserson, and Rivest's Introduction to Algorithms.
3) The space requirements will be O(n) in both cases. I'd guess the TreeMap
requires slightly more space, but only by a constant factor.
Map
you're using.HashMap
has a constant-time average lookup (O(1)), while a TreeMap
's average lookup time is based on the depth of the tree (O(log(n))), so a HashMap
is faster.It just did some benchmark testing on lookup performance between hashmap and sorted arraylist. The answer is hashmap is much faster as the size increase. I am talking about 10x, 20x, 30x faster. I did some test with 1 million of entries using sorted array list and hashmap and the array list get and add operation took seconds to complete, where as the hashmap get and put only takes around 50ms.
Here are something I found or observed:
For sorted arraylist, you would have to sort it first to be able to use the search efficiently (binarySearch for example). Practically you don't just have static list (meaning the list will change via add or remove). With that in mind you will need to change the add and the get methods to do "binary" operation to make it efficient (like binarySearch). So even with binary operation the add and get method will be slower and slower as the list grows.
Hashmap on the other hand does not show much of change in term of time in the put and get operation. The problem with Hashmap is memory overhead. If you can live with that then go with hashmap.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With