I was inspired by this topic: Performance and Memory allocation comparision between List and Set to actually run some tests and measure the performance difference between ArrayList
and HashSet
.
The most upvoted answer, in the mentioned topic, that intrigued me a lot (link), says:
HashSet consumes about 5.5 times more memory than ArrayList for the same number of elements
With the help of ScalaMeter I wanted to make sure about this.
I made two, simple tests, adding from 10000
to 100000
elements to both ArrayList
and HashSet
. Setting the initial size to the maximum did not change the results. I tested those collections with two types:
Int
(putting consecutive numbers 0 to 100000)String
(putting random String using Apache RandomStringUtils
)The code is available on my repository here.
And running those, gave me this results:
For collections holding Int
:
For collections holding String
of size 10:
For collections holding String
of size 50:
The question:
What happened to the theory mentioned in the quoted answer? Is it false? Or probably there is some mistake on my side?
Thanks :)!
Update after @andrzej answer I have updated the code (and the repository) once again. The results are getting better but still the results are not 5.5x different. I am checking something more now.
Please add measurement object as return value.
measure method "Int" in {
using(sizes) curve listS in { i =>
val c = new util.ArrayList[Int](i)
(0 until i).map(t => c.add(t))
c // return c
}
using(sizes) curve setS in { i =>
val c = new util.HashSet[Int]()
(0 until i).map(t => c.add(t))
c // return c
}
}
What happened to the theory mentioned in the quoted answer? Is it false?
We can do some calculations to get an estimate:
Let's look at the OpenJDK source for ArrayList and HashMap (since HashSet
is just a wrapper around HashMap
) for hints.
Assume you have n
elements to store.
Elements are stored in the field transient Object[] elementData;
. So length of elementData
must be at least n
.
Suppose you instantiated the list with new ArrayList<>(n)
and so elementData.length
is exactly n
.
Then the size of your list is n*c
bytes (where c
is the size of an object reference). Here I ignored the size
field and the object header of the list.
HashMap stores elements in transient Node<K,V>[] table;
where a node has fields
final int hash;
final K key;
V value;
Node<K,V> next;
Then for storing n
elements you need n
nodes or n*(3*c + 4)
bytes i.e each node has 3 object references - 3*c
bytes - and an int
- 4 bytes.
According to HashMap javadoc:
When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
Based on this I'll estimate that table.length == 2*n
.
Summing up a hashmap requires n*2*c + n*(3*c + 4) = n*5*c + n*4
bytes.
Now let's suppose you have a 64bit JVM and the size of an object reference is 8 bytes (i.e c = 8
) (let's igntore things like compressed oops).
Then n*5*c + n*4 = n*5*8 + n*4 = n*44
and n*c = n*8
.
Finally n*44 / n*8 = 5.5
So the original theory that HashSet
consumes about 5.5 times more memory than ArrayList
seems quite plausible and it seems likely that something is not right with your measurements.
I think, there're two problems here:
As Andrzej mentioned, you don't return your collections from the benchmark snippets. Scalameter measures footprint by performing GC before and after the benchmark execution (find details here). If you don't return the collection, it's just removed from memory by the after-test GC and test results are useless. It explains why the memory footprints in your tests remain small (around four bytes per object) and don't differ. But it doesn't explain why the footprint grows when the collection size grows, and here comes the second problem.
Some garbage collectors (CMS and G1 particularly) don't guarantee that after performing a garbage collection all the dead objects are removed from memory. If your JVM choses one of these collectors (or if you specify it manually), that would explain the memory footprint uptrend. You can check what collector is in use by providing -XX:+PrintFlagsFinal
option to your test and finding the values of UseG1GC
and UseConcMarkSweepGC
flags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With