Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java HashSet performance

I understand HashSet based on HashMap, since they are pretty similar. It makes the code more flexible and minimizes implementation effort. However, one reference variable in the HashSet's Entry seem to be unnecessary for me if the class forbids null element, therefore the whole Entry makes no point. Despite this fact, an Entry takes 24 byte memory / element, whereas a single array with the set's elements would take only 4 byte / element if my figures are correct. (aside from array's header)

If my argument is correct, does the advantages really overweight this performance hit?

(if i am wrong, i would learn from it aswell)

like image 628
freestar Avatar asked Feb 27 '16 11:02

freestar


1 Answers

Though this question is primarily opinion-based, I'll summarize a few points on the topic:

  • HashSet appeared in Java 1.2 many years ago. It's hard to guess now the exact reasons for design decisions made at that times, but clearly Java wasn't used for high-loaded applications; performance played less role than simplicity.
  • You are right that HashSet is suboptimal in its memory consumption. The problem is known, the bug JDK-6624565 is registered, and discussions at core-libs-dev are held from time to time. But is this a blocker for many real world applications? Probably, no.
  • For those uncommon applications where HashSet memory usage is unacceptable, there are already good alternatives, like trove THashSet.
  • Note that open addressing algorithms has their disadvantages, e.g. significant performance degradation with load factors close to 1; element removal difficulties. See the related answer.
like image 52
apangin Avatar answered Oct 16 '22 05:10

apangin