The best general purpose or 'primary' implementations are likely ArrayList , LinkedHashMap , and LinkedHashSet . Their overall performance is better, and you should use them unless you need a special feature provided by another implementation. That special feature is usually ordering or sorting.
The Java collections framework is a set of classes and interfaces that implement commonly reusable collection data structures.
fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion; provides also big (64-bit) arrays, sets and lists, and fast, practical I/O classes for binary and text files.
The question is (now) about storing lots of data, which can be represented using primitive types like int
, in a Map. Some of the answers here are very misleading in my opinion. Let's see why.
I modified the benchmark from trove to measure both runtime and memory consumption. I also added PCJ to this benchmark, which is another collections library for primitive types (I use that one extensively). The 'official' trove benchmark does not compare IntIntMaps to Java Collection's Map<Integer, Integer>
, probably storing Integers
and storing ints
is not the same from a technical point of view. But a user might not care about this technical detail, he wants to store data representable with ints
efficiently.
First the relevant part of the code:
new Operation() {
private long usedMem() {
System.gc();
return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
}
// trove
public void ours() {
long mem = usedMem();
TIntIntHashMap ours = new TIntIntHashMap(SET_SIZE);
for ( int i = dataset.size(); i-- > 0; ) {
ours.put(i, i);
}
mem = usedMem() - mem;
System.err.println("trove " + mem + " bytes");
ours.clear();
}
public void pcj() {
long mem = usedMem();
IntKeyIntMap map = new IntKeyIntOpenHashMap(SET_SIZE);
for ( int i = dataset.size(); i-- > 0; ) {
map.put(i, i);
}
mem = usedMem() - mem;
System.err.println("pcj " + mem + " bytes");
map.clear();
}
// java collections
public void theirs() {
long mem = usedMem();
Map<Integer, Integer> map = new HashMap<Integer, Integer>(SET_SIZE);
for ( int i = dataset.size(); i-- > 0; ) {
map.put(i, i);
}
mem = usedMem() - mem;
System.err.println("java " + mem + " bytes");
map.clear();
}
I assume the data comes as primitive ints
, which seems sane. But this implies a runtime penalty for java util, because of the auto-boxing, which is not neccessary for the primitive collections frameworks.
The runtime results (without gc()
calls, of course) on WinXP, jdk1.6.0_10:
100000 put operations 100000 contains operations java collections 1938 ms 203 ms trove 234 ms 125 ms pcj 516 ms 94 ms
While this might already seem drastic, this is not the reason to use such a framework.
The reason is memory performance. The results for a Map containing 100000 int
entries:
java collections oscillates between 6644536 and 7168840 bytes trove 1853296 bytes pcj 1866112 bytes
Java Collections needs more than three times the memory compared to the primitive collection frameworks. I.e. you can keep three times as much data in memory, without resorting to disk IO which lowers runtime performance by magnitudes. And this matters. Read highscalability to find out why.
In my experience high memory consumption is the biggest performance issue with Java, which of course results in worse runtime performance as well. Primitive collection frameworks can really help here.
So: No, java.util is not the answer. And "adding functionality" to Java collections is not the point when asking about efficiency. Also the modern JDK collections do not "out-perform even the specialized Trove collections".
Disclaimer: The benchmark here is far from complete, nor is it perfect. It is meant to drive home the point, which I have experienced in many projects. Primitive collections are useful enough to tolerate fishy API - if you work with lots of data.
From inspection, it looks like Trove is just a library of collections for primitive types - it's not like it's meant to be adding a lot of functionality over the normal collections in the JDK.
Personally (and I'm biased) I love Guava (including the former Google Java Collections project). It makes various tasks (including collections) a lot easier, in a way which is at least reasonably efficient. Given that collection operations rarely form a bottleneck in my code (in my experience) this is "better" than a collections API which may be more efficient but doesn't make my code as readable.
Given that the overlap between Trove and the Guava is pretty much nil, perhaps you could clarify what you're actually looking for from a collections library.
I know this is an old post and there are a ton of answers here. But, The answers above are superficial and over simplified in terms of suggesting a library. There is no one library that does well across the various benchmarks presented here. The only conclusion I derive is if you care about performance and memory and specifically dealing with primitive types, its more than worth looking at the non jdk alternatives.
Here is a more sound analysis, in terms of benchmark mechanics and the libraries covered. This is a thread in the mahout dev list.
The libraries covered are
Update June 2015: Unfortunately, the original benchmarks are no longer available and besides its a bit outdated. Here is a fairly recent (Jan 2015) benchmarks done by someone else. It is not as comprehensive nor does it have the interactive exploratory tools as the original link.
As other commentators have noticed, the definition of "efficient" casts a wide net. However no one has yet mentioned the Javolution library.
Some of the highlights:
The Javolution distribution includes a benchmark suite so you can see how they stack up against other libraries/the built-in collections.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With