ArrayList and HashSet memory allocation strange test results

Question

I was inspired by this topic: Performance and Memory allocation comparision between List and Set to actually run some tests and measure the performance difference between ArrayList and HashSet.

The most upvoted answer, in the mentioned topic, that intrigued me a lot (link), says:

HashSet consumes about 5.5 times more memory than ArrayList for the same number of elements

With the help of ScalaMeter I wanted to make sure about this.

I made two, simple tests, adding from 10000 to 100000 elements to both ArrayList and HashSet. Setting the initial size to the maximum did not change the results. I tested those collections with two types:

Int (putting consecutive numbers 0 to 100000)
String (putting random String using Apache RandomStringUtils)

The code is available on my repository here.

And running those, gave me this results:

X-axis - size -> size of the collection
Y-axis - value -> amount of kB used

For collections holding Int: Integer results

For collections holding String of size 10: String results size 10

For collections holding String of size 50: String results size 50

The question:

What happened to the theory mentioned in the quoted answer? Is it false? Or probably there is some mistake on my side?

Thanks :)!

Update after @andrzej answer I have updated the code (and the repository) once again. The results are getting better but still the results are not 5.5x different. I am checking something more now.

Andrzej Jozwik · Accepted Answer

Please add measurement object as return value.

measure method "Int" in {
  using(sizes) curve listS in { i =>
    val c = new util.ArrayList[Int](i)
    (0 until i).map(t => c.add(t))
    c // return c
  }

  using(sizes) curve setS in { i =>
    val c = new util.HashSet[Int]()
    (0 until i).map(t => c.add(t))
    c // return c
  }
}

binoternary · Answer

What happened to the theory mentioned in the quoted answer? Is it false?

We can do some calculations to get an estimate:

Let's look at the OpenJDK source for ArrayList and HashMap (since HashSet is just a wrapper around HashMap) for hints.

Assume you have n elements to store.

ArrayList

Elements are stored in the field transient Object[] elementData;. So length of elementData must be at least n.
Suppose you instantiated the list with new ArrayList<>(n) and so elementData.length is exactly n. Then the size of your list is n*c bytes (where c is the size of an object reference). Here I ignored the size field and the object header of the list.

HashMap

HashMap stores elements in transient Node<K,V>[] table; where a node has fields

final int hash;
final K key;
V value;
Node<K,V> next;

Then for storing n elements you need n nodes or n*(3*c + 4) bytes i.e each node has 3 object references - 3*c bytes - and an int - 4 bytes.
According to HashMap javadoc:

When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

Based on this I'll estimate that table.length == 2*n.
Summing up a hashmap requires n*2*c + n*(3*c + 4) = n*5*c + n*4 bytes.

Summary

Now let's suppose you have a 64bit JVM and the size of an object reference is 8 bytes (i.e c = 8) (let's igntore things like compressed oops). Then n*5*c + n*4 = n*5*8 + n*4 = n*44 and n*c = n*8.
Finally n*44 / n*8 = 5.5

So the original theory that HashSet consumes about 5.5 times more memory than ArrayList seems quite plausible and it seems likely that something is not right with your measurements.

Andrew Lygin · Answer

I think, there're two problems here:

As Andrzej mentioned, you don't return your collections from the benchmark snippets. Scalameter measures footprint by performing GC before and after the benchmark execution (find details here). If you don't return the collection, it's just removed from memory by the after-test GC and test results are useless. It explains why the memory footprints in your tests remain small (around four bytes per object) and don't differ. But it doesn't explain why the footprint grows when the collection size grows, and here comes the second problem.
Some garbage collectors (CMS and G1 particularly) don't guarantee that after performing a garbage collection all the dead objects are removed from memory. If your JVM choses one of these collectors (or if you specify it manually), that would explain the memory footprint uptrend. You can check what collector is in use by providing -XX:+PrintFlagsFinal option to your test and finding the values of UseG1GC and UseConcMarkSweepGC flags.

ArrayList and HashSet memory allocation strange test results

Tags:

java

collections

performance-testing

scala

scalameter

Atais

3 Answers

Andrzej Jozwik

ArrayList

HashMap

Summary

binoternary

Andrew Lygin

Recent Activity

Donate For Us

ArrayList and HashSet memory allocation strange test results

Tags:

java

collections

performance-testing

scala

scalameter

Atais

3 Answers

Andrzej Jozwik

ArrayList

HashMap

Summary

binoternary

Andrew Lygin

Related questions

Recent Activity

Donate For Us