I have a program working on enormous data sets. The objects are best stored on hash implemented containers since the program keeps seeking for objects in the container. The first idea was to use HashMap since the methods get and remove of this container are more suitable to the uses I need. But, I came to see the use of HashMap is pretty memory consumable which is a major problem, so i thought switching to HashSet will be better because it only uses <code><E></code>, and not <code><K,V></code> per element, but when I looked at the implementation i learned it uses an underlying HashMap! this means it wont save any memory! So this is my questions: <ul> <li>Are all my assumptions true?</li> <li>Is HashMap memory wasteful? more specifically, what is its overhead for each entry?</li> <li>Is HashSet just as wasteful as HashMap?</li> <li> Is there any other Hash based containers which will be significantly less memory consumables? update </li> </ul> As requested in the comments I will extend a bit on my program, the hashMap is meant to hold a pair of other objects, and some numeric value - a float- calculated from them. along the way it extracts some of them and enters new pairs. Given a pair it needs to ensure it doesnt hold this pair or to remove it. The mapping can be done using the float value or the <code>hashCode</code> of the pair object. Additionally when i say "enormous data sets" I am talking about ~ 4*10^9 objects

<blockquote> Are all my assumptions true? </blockquote> You are correct that <code>HashSet</code> is implemented using <code>HashMap</code>, so you will not save any memory by using <code>HashSet</code> instead. If you're creating maps with a large number of elements, you should construct your <code>HashMap</code>s with an <code>initialCapacity</code> to the best of your knowledge, in order to prevent repeated rehashing (thus memory thrashing). <blockquote> Is HashMap memory wasteful? more specifically, what is its overhead for each entry? </blockquote> No, it's not wasteful. The overhead is an underlying array (size modified by <code>loadFactor</code>), and an <code>Entry</code> object for each key-value pair. In addition to storing a key and value, the entry object also stores a pointer to the next entry in a slot (in case two or more entries are occupying the same slot in the underlying array). The default loadFactor of <code>0.75</code> keeps the underlying array size at 133% of the number of entries. Very specifically, the memory overhead for each entry is: <ul> <li>the entry object's reference to the key,</li> <li>the entry object's reference to the value,</li> <li>the entry object's reference to the next entry,</li> <li>and the underlying array's reference to the entry (divided by load factor).</li> </ul> It's very difficult to get much more trim than that for a hash-based collection. <blockquote> Is HashSet just as wasteful as HashMap? </blockquote> You will gain no memory efficiency by using <code>HashSet</code> instead of <code>HashMap</code>. <blockquote> Is there any other Hash based containers which will be significantly less memory consumables? </blockquote> If your keys are primitives (e.g. <code>int</code>s), there are custom <code>Map</code> and <code>Set</code> implementations out there (in third party libraries) which use more memory-efficient data structures.

Java : HashSet vs. HashMap

Tags:

java

memory-management

hashmap

hash

hashset

I have a program working on enormous data sets. The objects are best stored on hash implemented containers since the program keeps seeking for objects in the container.

The first idea was to use HashMap since the methods get and remove of this container are more suitable to the uses I need.

But, I came to see the use of HashMap is pretty memory consumable which is a major problem, so i thought switching to HashSet will be better because it only uses <E>, and not <K,V> per element, but when I looked at the implementation i learned it uses an underlying HashMap! this means it wont save any memory!

So this is my questions:

Are all my assumptions true?
Is HashMap memory wasteful? more specifically, what is its overhead for each entry?
Is HashSet just as wasteful as HashMap?
Is there any other Hash based containers which will be significantly less memory consumables?

update

As requested in the comments I will extend a bit on my program, the hashMap is meant to hold a pair of other objects, and some numeric value - a float- calculated from them. along the way it extracts some of them and enters new pairs. Given a pair it needs to ensure it doesnt hold this pair or to remove it. The mapping can be done using the float value or the hashCode of the pair object.

Additionally when i say "enormous data sets" I am talking about ~ 4*10^9 objects

538

asked Feb 01 '15 09:02

Ravid Goldenberg

2 Answers

There are very useful tips on this site about collections performance in java.

HashSet is built on top of a HashMap< T, Object >, where value is a singleton ‘present’ object. It means that the memory consumption of aHashSet is identical to HashMap: in order to store SIZE values, you need 32 * SIZE + 4 * CAPACITY bytes (plus size of your values). It is definitely not a memory-friendly collection.

THashSet could be the easiest replacement collection for a HashSet – it implements Set and Iterable, which means you should just update a single letter in the initialization of your set.

THashSet uses a single object array for its values, so it uses 4 * CAPACITY bytes for storage. As you can see, compared to JDK HashSet, you will save 32 * SIZE bytes in case of the identical load factor, which is a huge improvement.

Also the below image which I took from here can help us keeping something in mind for choosing right collection

enter image description here

answered Sep 22 '22 11:09

nil

Are all my assumptions true?

You are correct that HashSet is implemented using HashMap, so you will not save any memory by using HashSet instead.

If you're creating maps with a large number of elements, you should construct your HashMaps with an initialCapacity to the best of your knowledge, in order to prevent repeated rehashing (thus memory thrashing).

Is HashMap memory wasteful? more specifically, what is its overhead for each entry?

No, it's not wasteful. The overhead is an underlying array (size modified by loadFactor), and an Entry object for each key-value pair. In addition to storing a key and value, the entry object also stores a pointer to the next entry in a slot (in case two or more entries are occupying the same slot in the underlying array). The default loadFactor of 0.75 keeps the underlying array size at 133% of the number of entries.

Very specifically, the memory overhead for each entry is:

the entry object's reference to the key,
the entry object's reference to the value,
the entry object's reference to the next entry,
and the underlying array's reference to the entry (divided by load factor).

It's very difficult to get much more trim than that for a hash-based collection.

Is HashSet just as wasteful as HashMap?

You will gain no memory efficiency by using HashSet instead of HashMap.

Is there any other Hash based containers which will be significantly less memory consumables?

If your keys are primitives (e.g. ints), there are custom Map and Set implementations out there (in third party libraries) which use more memory-efficient data structures.

answered Sep 19 '22 11:09

gknicker

Related questions
                            
                                Creating font and text styles in android with Paint object
                            
                                Spring mongo adding criteria to and operator dynamicaly
                            
                                How does Lists.newArraylist() of Guava library work?
                            
                                Java 3D array assign values
                            
                                java.lang.UnsatisfiedLinkError: no sqljdbc_auth in java.library.path
                            
                                Get Jackson XMLMapper to set root element name in code
                            
                                Make a String text Bold in Java Android
                            
                                Duplicate local variable(For Loops)
                            
                                Can gradle create the java project directory structure?
                            
                                Eclipse: Project cannot reference itself
                            
                                Autoboxing can't convert an int to an Integer
                            
                                Parse: How to set a pointer in Android?
                            
                                Get the weeknumber from a given date in Java FX
                            
                                How to return a string as valid JSON from Spring web app?
                            
                                Initialize ArrayList<ArrayList<Integer>>
                            
                                Week representation in Java 8 Time API
                            
                                Get filename from Content-Disposition [closed]
                            
                                Adding a Pool of Threads in a RxJava Flow
                            
                                Define a relative path of image in Java FX
                            
                                Can someone explain .wav(WAVE) file headers?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With