Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Allocated Memory for Hashtable.put()

So I was reading Peter Norvig's IAQ (infrequently asked questions - link) and stumbled upon this:

You might be surprised to find that an Object takes 16 bytes, or 4 words, in the Sun JDK VM. This breaks down as follows: There is a two-word header, where one word is a pointer to the object's class, and the other points to the instance variables. Even though Object has no instance variables, Java still allocates one word for the variables. Finally, there is a "handle", which is another pointer to the two-word header. Sun says that this extra level of indirection makes garbage collection simpler. (There have been high performance Lisp and Smalltalk garbage collectors that do not use the extra level for at least 15 years. I have heard but have not confirmed that the Microsoft JVM does not have the extra level of indirection.)

An empty new String() takes 40 bytes, or 10 words: 3 words of pointer overhead, 3 words for the instance variables (the start index, end index, and character array), and 4 words for the empty char array. Creating a substring of an existing string takes "only" 6 words, because the char array is shared. Putting an Integer key and Integer value into a Hashtable takes 64 bytes (in addition to the four bytes that were pre-allocated in the Hashtable array): I'll let you work out why.

So well I obviously tried, but I can't figure it out. In the following I only count words: A Hashtable put creates one Hashtable$Entry: 3 (overhead) + 4 variables (3 references which I assume are 1 word + 1 int). I further assume that he means that the Integers are newly allocated (so not cached by the Integer class or already exist) which comes to 2* (3 [overhead] + 1 [1 int value]).

So in the end we end up with.. 15 words or 60bytes. So what I first thought was that the Entry as a inner class needs a reference to its outer object, but alas it's static so that doesn't make much sense (sure we have to store a pointer to the parent class, but I'd think that information is stored in the class header by the VM).

Just idle curiosity and I'm well aware that all this depends to a good bit on the actual JVM implementation (and on a 64bit version the results would be different), but still I don't like questions I can't answer :)

Edit: Just to make this a bit clearer: While I'm well aware that more compact structures can get us some performance benefits, I agree that in general worrying about a few bytes here or there is a waste of time. I surely wouldn't stop using a Hashtable just because of a few bytes overhead here or there just like I wouldn't use plain char arrays instead of Strings (or start using C). This is purely of academic interest to learn a bit more about the insides of Java/the JVM :)

like image 200
Voo Avatar asked Nov 04 '22 21:11

Voo


1 Answers

The author appears to assume there is 3 Objects with 16 bytes overhead each and 2 32-bit references in the Map.Entry and 2 x 1 32-bit int values. This would total 64-bytes

This is flawed in that Sun/Oracle's JVM only allocates on 8-byte boundaries so that while technically an Integer occupies 20 bytes of memory, 24 bytes is used (the next multiple of 8)

Additionally many JVMs now use 64-bit references so the Map.Entry would use another 16 bytes.

This is all very inefficient, which is why you might use a class like TIntIntHashMap instead which use primitives.

However, usually it doesn't matter as memory is surprising cheap when you compare it to the cost of your time. If you work on server applications and you cost your company about $40/hour, you need to be saving about 10 MB every minute to save as much memory as you are costing. (Ideally you need to be saving much more than this) Saving 10 MB each and every minute is hard.

Memory is reusable, but your time isn't.

like image 178
Peter Lawrey Avatar answered Nov 09 '22 03:11

Peter Lawrey