In addition to this quite old post, I need something that will use primitives and give a speedup for an application that contains lots of HashSet
s of Integers
:
Set<Integer> set = new HashSet<Integer>();
So people mention libraries like Guava, Javalution, Trove, but there is no perfect comparison of those in terms of benchmarks and performance results, or at least good answer coming from good experience. From what I see many recommend Trove's TIntHashSet
, but others say it is not that good; some say Guava is supercool and manageable, but I do not need beauty and maintainability, only time execution, so Python's style Guava goes home :) Javalution? I've visited the website, seems too old for me and thus wacky.
The library should provide the best achievable time, memory does not matter.
Looking at "Thinking in Java", there is an idea of creating custom HashMap
with int[]
as keys. So I would like to see something similar with a HashSet
or simply download and use an amazing library.
EDIT (in response to the comments below)
So in my project I start from about 50 HashSet<Integer>
collections, then I call a function about 1000 times that inside creates up to 10 HashSet<Integer>
collections. If I change initial parameters, the numbers may grow up exponentially. I only use add()
, contains()
and clear()
methods on those collections, that is why they were chosen.
Now I'm going to find a library that implements HashSet
or something similar, but will do that faster due to autoboxing Integer
overhead and maybe something else which I do not know. In fact, I'm using ints as my data comes in and store them in those HashSet
s.
Trove is an excellent choice.
The reason why it is much faster than generic collections is memory use.
A java.util.HashSet<Integer>
uses a java.util.HashMap<Integer, Integer>
internally. In a HashMap
, each object is contained in an Entry<Integer, Integer>
. These objects take estimated 24 bytes for the Entry
+ 16 bytes for the actual integer + 4 bytes in the actual hash table. This yields 44 bytes, as opposed to 4 bytes in Trove, an up to 11x memory overhead (note that unoccupied entires in the main table will yield a smaller difference in practise).
See also these experiments:
http://www.takipiblog.com/2014/01/23/java-scala-guava-and-trove-collections-how-much-can-they-hold/
Take a look at the High Performance Primitive Collections for Java (HPPC). It is an alternative to trove, mature and carefully designed for efficiency. See the JavaDoc for the IntOpenHashSet.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With