I wanted to use a HashSet<Long>
for storing a large list of unique numbers in memory. I calculated the approximate memory to be consumed (in 64 bit pointer size):
Long would take 16 bytes of space. So initially I multiplied the number of entries with 16 to get the memory. But in reality, the memory was much more than 16 bytes per entry. After that I studied HashSet
implementation. In short, in the underlying implementation, it actually stores an extra dummy object (12 bytes) with each entry of hashset. And a pointer (8 bytes) to next entry. Thus conceding extra 12+8 bytes per entry.
So total memory per entry: 16+12+8 = 36 bytes. But still when I ran the code and checked the memory, it was still much more than 36 bytes per entry.
My Question(In short): How much memory does a HashSet
take (for instance, on 64 bit machine)?
HashSet is the best approach for search operations. The initial default capacity of HashSet is 16, and the load factor is 0.75.
On average, the contains() of HashSet runs in O(1) time. Getting the object's bucket location is a constant time operation. Taking into account possible collisions, the lookup time may rise to log(n) because the internal bucket structure is a TreeMap.
On Linux environment the long takes 64-bit (8-bytes) of space, and the long long takes 128-bits (16-bytes) of space. This is used when we want to deal with some large value of integers. We can test the size of different types using this simple program.
You can measure exactly this size using this test:
long m1 = Runtime.getRuntime().freeMemory();
// create object (s) here
long m2 = Runtime.getRuntime().freeMemory();
System.out.println(m1 - m2);
to be run with -XX:-UseTLAB option
On my 64-bit HotSpot empty HashSet takes 480 bytes.
Why so much? Because HashSet has a complex structure (btw IDE in debug mode helps see actual fields). It is based on HashMap (Adapter pattern). So HashSet itself contains a reference to a HashMap. HashMap contains 8 fields. Actual data are in an array of Nodes. A Node has: int hash; K key; V value; Node next. HashSet uses only keys and puts a dummy object in values.
The size of objects is an implementation detail. There is no guarantee that if it's x bytes on one platform, on another it's also x bytes.
Long
is boxed as you know, but 16 bytes is wrong. The primitive long
takes 8 bytes but the size of the box around the long
is implementation dependent. According to this Hotspot related answer overhead words and padding means a boxed 4-byte int
can come to 24 bytes!
The byte alignment and padding mentioned in that (Hotspot specific) answer also would apply to the Entry
objects which would also push the consumption up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With