Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How much memory Java HashSet<Long> should take

I wanted to use a HashSet<Long> for storing a large list of unique numbers in memory. I calculated the approximate memory to be consumed (in 64 bit pointer size):

Long would take 16 bytes of space. So initially I multiplied the number of entries with 16 to get the memory. But in reality, the memory was much more than 16 bytes per entry. After that I studied HashSet implementation. In short, in the underlying implementation, it actually stores an extra dummy object (12 bytes) with each entry of hashset. And a pointer (8 bytes) to next entry. Thus conceding extra 12+8 bytes per entry.

So total memory per entry: 16+12+8 = 36 bytes. But still when I ran the code and checked the memory, it was still much more than 36 bytes per entry.

My Question(In short): How much memory does a HashSet take (for instance, on 64 bit machine)?

like image 259
Syed Aqeel Ashiq Avatar asked Apr 08 '15 15:04

Syed Aqeel Ashiq


People also ask

How much space does a HashSet take in Java?

HashSet is the best approach for search operations. The initial default capacity of HashSet is 16, and the load factor is 0.75.

What is the time complexity of HashSet contains?

On average, the contains() of HashSet runs in O(1) time. Getting the object's bucket location is a constant time operation. Taking into account possible collisions, the lookup time may rise to log(n) because the internal bucket structure is a TreeMap.

How much space long takes?

On Linux environment the long takes 64-bit (8-bytes) of space, and the long long takes 128-bits (16-bytes) of space. This is used when we want to deal with some large value of integers. We can test the size of different types using this simple program.


2 Answers

You can measure exactly this size using this test:

    long m1 = Runtime.getRuntime().freeMemory();
    // create object (s) here
    long m2 = Runtime.getRuntime().freeMemory();
    System.out.println(m1 - m2);

to be run with -XX:-UseTLAB option

On my 64-bit HotSpot empty HashSet takes 480 bytes.

Why so much? Because HashSet has a complex structure (btw IDE in debug mode helps see actual fields). It is based on HashMap (Adapter pattern). So HashSet itself contains a reference to a HashMap. HashMap contains 8 fields. Actual data are in an array of Nodes. A Node has: int hash; K key; V value; Node next. HashSet uses only keys and puts a dummy object in values.

like image 125
Evgeniy Dorofeev Avatar answered Sep 19 '22 15:09

Evgeniy Dorofeev


The size of objects is an implementation detail. There is no guarantee that if it's x bytes on one platform, on another it's also x bytes.

Long is boxed as you know, but 16 bytes is wrong. The primitive long takes 8 bytes but the size of the box around the long is implementation dependent. According to this Hotspot related answer overhead words and padding means a boxed 4-byte int can come to 24 bytes!

The byte alignment and padding mentioned in that (Hotspot specific) answer also would apply to the Entry objects which would also push the consumption up.

like image 30
weston Avatar answered Sep 22 '22 15:09

weston