In an interview recently, I was asked about how HashMap works in Java and I was able to explain it well and explain that in the worst case the HashMap may degenerate into a list due to chaining. I was asked to figure out a way to improve this performance but I was unable to do that during the interview. The interviewer asked me to look up "Trove".
I believe he was pointing to this page. I have read the description provided on that page but still can't figure out how it overcomes the limitations of the java.util.HashMap.
Even a hint would be appreciated. Thanks!!
Java Collections needs more than three times the memory compared to the primitive collection frameworks. I.e. you can keep three times as much data in memory, without resorting to disk IO which lowers runtime performance by magnitudes. And this matters. Read highscalability to find out why.
And "adding functionality" to Java collections is not the point when asking about efficiency. Also the modern JDK collections do not "out-perform even the specialized Trove collections". Disclaimer: The benchmark here is far from complete, nor is it perfect. It is meant to drive home the point, which I have experienced in many projects.
Trove has very specialized collections for cases like primitive keys/values. These days we find that on modern JDKs and with the Java 5+ collections and concurrent use cases, the JDK collections out-perform even the specialized Trove collections.
The 'official' trove benchmark does not compare IntIntMaps to Java Collection's Map<Integer, Integer>, probably storing Integers and storing ints is not the same from a technical point of view. But a user might not care about this technical detail, he wants to store data representable with ints efficiently. First the relevant part of the code:
The key phrase there is open addressing. Instead of hashing to an array of buckets, all the entries are in one big array. When you add an element, if the space for it is already in use you just move down the array to find a free space.
As long as the array is kept sufficiently bigger than the number of entries and the hash function is well distributed it's possible to keep average lookup times small. And by having one array you can get better performance - it's more cache friendly.
However it still has worst-case linear behaviour if (say) every key hashes to the same value, so it doesn't avoid that issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With