Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest Java HashSet<Integer> library [closed]

In addition to this quite old post, I need something that will use primitives and give a speedup for an application that contains lots of HashSets of Integers:

Set<Integer> set = new HashSet<Integer>();

So people mention libraries like Guava, Javalution, Trove, but there is no perfect comparison of those in terms of benchmarks and performance results, or at least good answer coming from good experience. From what I see many recommend Trove's TIntHashSet, but others say it is not that good; some say Guava is supercool and manageable, but I do not need beauty and maintainability, only time execution, so Python's style Guava goes home :) Javalution? I've visited the website, seems too old for me and thus wacky.

The library should provide the best achievable time, memory does not matter.

Looking at "Thinking in Java", there is an idea of creating custom HashMap with int[] as keys. So I would like to see something similar with a HashSet or simply download and use an amazing library.

EDIT (in response to the comments below) So in my project I start from about 50 HashSet<Integer> collections, then I call a function about 1000 times that inside creates up to 10 HashSet<Integer> collections. If I change initial parameters, the numbers may grow up exponentially. I only use add(), contains() and clear() methods on those collections, that is why they were chosen.

Now I'm going to find a library that implements HashSet or something similar, but will do that faster due to autoboxing Integer overhead and maybe something else which I do not know. In fact, I'm using ints as my data comes in and store them in those HashSets.

like image 471
Sophie Sperner Avatar asked Aug 06 '12 21:08

Sophie Sperner


2 Answers

Trove is an excellent choice.

The reason why it is much faster than generic collections is memory use.

A java.util.HashSet<Integer> uses a java.util.HashMap<Integer, Integer> internally. In a HashMap, each object is contained in an Entry<Integer, Integer>. These objects take estimated 24 bytes for the Entry + 16 bytes for the actual integer + 4 bytes in the actual hash table. This yields 44 bytes, as opposed to 4 bytes in Trove, an up to 11x memory overhead (note that unoccupied entires in the main table will yield a smaller difference in practise).

See also these experiments:

http://www.takipiblog.com/2014/01/23/java-scala-guava-and-trove-collections-how-much-can-they-hold/

like image 72
Has QUIT--Anony-Mousse Avatar answered Sep 30 '22 17:09

Has QUIT--Anony-Mousse


Take a look at the High Performance Primitive Collections for Java (HPPC). It is an alternative to trove, mature and carefully designed for efficiency. See the JavaDoc for the IntOpenHashSet.

like image 21
cruftex Avatar answered Sep 30 '22 19:09

cruftex