Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retained heap size of a string in java

This is a question that we have had trouble understanding. It's tricky to describe it using text but I hope that the gist will be understood.

I understand that a string's actual content is enclosed in an internal char array. In normal instances the retained heap size of the string will include 40 bytes plus the size of the character array. This is explained here. When calling a substring the character array retains a reference to the original string and therefore the retained size of the character array could be a lot bigger than the string itself.

However when profiling memory usage using Yourkit or MAT something strange seems to happen. The string that references the char array's retained size does not include the retained size of the character array.

An example could be as follows (semi pseudo-code):

String date = "2011-11-33"; (24 bytes)
date.value = char{1172}; (2360 bytes)

The string's retained size is defined as 24 bytes without including the character array's retained size. This could make sense if there are a lot of references to the character array due to many substring operations.

Now when this string is included in some type of collection such as an array or list then the retained size of this array will include the retained size of all the strings including the character array's retained size.

We then have a situation like this:

Array's retained size = 300 bytes
array[0] = String 40 bytes;
array[1] = String 40 bytes;
array[1].value = char[] (220 bytes)

You therefore have to look into each array entry to try to work out where the retained size comes from.

Again this can be explained in that the array holds all the strings that hold references to the same character array and therefore altogether the array's retained size is correct.

Now we get to the problem.

I keep in a separate object a reference to the array that I discussed above as well as a different array with the same strings. In both arrays the strings refer to the same character array. This is expected - after all we are talking about the same string. However the retained size of this character array is counted for both arrays in this new object. In other words the retained size seems to be double. If I delete the first array then the second array will still hold a reference to the character array and vice versa. This causes a confusion in that it seems that java is holding two separate references to the same character array. How can this be? Is this a problem with java's memory or is it just the way that the profilers display information?

This problem caused a lot of headaches for us in trying to track down huge memory usage in our application.

Again - I hope that someone out there will be able to understand the question and explain it.

Thanks for your help

like image 328
slbruce Avatar asked Dec 08 '11 08:12

slbruce


2 Answers

I keep in a separate object a reference to the array that I discussed above as well as a different array with the same strings. In both arrays the strings refer to the same character array. This is expected - after all we are talking about the same string. However the retained size of this character array is counted for both arrays in this new object. In other words the retained size seems to be double.

What you have here is a transitive reference in a dominator tree:

enter image description here

The character array should not show up in the retained size of either array. If the profiler displays it that way, then that's misleading.

This is how JProfiler shows this situation in the biggest objects view:

enter image description here

The string instance that is contained in both arrays, is shown outside the array instances, with a [transitive reference] label. If you want to explore the actual paths, you can add the array holder and the string to the graph and find all paths between them:

enter image description here

Disclaimer: My company develops JProfiler.

like image 185
Ingo Kegel Avatar answered Sep 20 '22 12:09

Ingo Kegel


I'd say it is just the way the profiler displays the information. It has no idea that the two arrays should be considered for "deduplication". How about you wrap the two arrays into some kind of dummy holder object, and run your profiler against that? Then, it should be able to take care of the "double-counting".

like image 31
Thilo Avatar answered Sep 21 '22 12:09

Thilo