Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are Trove collections more efficient than the standard Java collections?

In an interview recently, I was asked about how HashMap works in Java and I was able to explain it well and explain that in the worst case the HashMap may degenerate into a list due to chaining. I was asked to figure out a way to improve this performance but I was unable to do that during the interview. The interviewer asked me to look up "Trove".

I believe he was pointing to this page. I have read the description provided on that page but still can't figure out how it overcomes the limitations of the java.util.HashMap.

Even a hint would be appreciated. Thanks!!

like image 622
Kumud Sinha Avatar asked Nov 16 '13 17:11

Kumud Sinha


People also ask

Why Java collections need more memory than primitive collections?

Java Collections needs more than three times the memory compared to the primitive collection frameworks. I.e. you can keep three times as much data in memory, without resorting to disk IO which lowers runtime performance by magnitudes. And this matters. Read highscalability to find out why.

Do modern Java collections out-perform even the specialized trove collections?

And "adding functionality" to Java collections is not the point when asking about efficiency. Also the modern JDK collections do not "out-perform even the specialized Trove collections". Disclaimer: The benchmark here is far from complete, nor is it perfect. It is meant to drive home the point, which I have experienced in many projects.

What is the difference between trove and JDK collections?

Trove has very specialized collections for cases like primitive keys/values. These days we find that on modern JDKs and with the Java 5+ collections and concurrent use cases, the JDK collections out-perform even the specialized Trove collections.

How does trove compare intintmaps to Java collection's Map<integer>?

The 'official' trove benchmark does not compare IntIntMaps to Java Collection's Map<Integer, Integer>, probably storing Integers and storing ints is not the same from a technical point of view. But a user might not care about this technical detail, he wants to store data representable with ints efficiently. First the relevant part of the code:


1 Answers

The key phrase there is open addressing. Instead of hashing to an array of buckets, all the entries are in one big array. When you add an element, if the space for it is already in use you just move down the array to find a free space.

As long as the array is kept sufficiently bigger than the number of entries and the hash function is well distributed it's possible to keep average lookup times small. And by having one array you can get better performance - it's more cache friendly.

However it still has worst-case linear behaviour if (say) every key hashes to the same value, so it doesn't avoid that issue.

like image 113
Alan Stokes Avatar answered Sep 24 '22 21:09

Alan Stokes