Before anyone questions the fact of using string.intern()
at all, let me say that I need it in my particular application for memory and performance reasons. [1]
So, until now I used String.intern()
and assumed it was the most efficient way to do it. However, I noticed since ages it is a bottleneck in the software. [2]
Then, just recently, I tried to replace the String.intern()
by a huge map where I put/get the strings in order to obtain each time a unique instance. I expected this would be slower... but it was exactly the opposite! It was tremendously faster! Replacing the intern()
by pushing/polling a map (which achieves exactly the same) resulted in more than one order of magnitude faster.
The question is: why is intern()
so slow?!? Why isn't it then simply backed up by a map (or actually, just a customized set) and would be tremendously faster? I'm puzzled.
[1]: For the unconvinced ones: It is in natural language processing and has to process gigabytes of text, therefore needs to avoid many instances of a same string to avoid blowing up the memory and referential string comparison to be fast enough.
[2]: without it (normal strings) it is impossible, with it, this particular step remains the most computation intensive one
EDIT:
Due to the surprising interest in this post, here is some code to test it out:
http://pastebin.com/4CD8ac69
And the results of interning a bit more than 1 million strings:
HashMap
: 4 secondsString.intern()
: 54 secondsDue to avoid some warm-up / OS IO caching and stuff like this, the experiment was repeated by inverting the order of both benchmarks:
String.intern()
: 69 secondsHashMap
: 3 secondsAs you see, the difference is very noticeable, more than tenfolds. (Using OpenJDK 1.6.0_22 64bits ...but using the sun one resulted in similar results I think)
The method intern() creates an exact copy of a String object in the heap memory and stores it in the String constant pool. Note that, if another String with the same contents exists in the String constant pool, then a new object won't be created and the new reference will point to the other String.
String Interning is a method of storing only one copy of each distinct String Value, which must be immutable. By applying String. intern() on a couple of strings will ensure that all strings having the same contents share the same memory.
The intern() method creates an exact copy of a string that is present in the heap memory and stores it in the String constant pool if not already present. If the string is already present, it returns the reference. The intern() method helps to save memory space and reuse it efficiently at the cost of time.
The Java String class intern() method returns the interned string. It returns the canonical representation of string. It can be used to return string from memory if it is created by a new keyword. It creates an exact copy of the heap string object in the String Constant Pool.
This article discusses the implementation of String.intern()
. In Java 6 and 7, the implementation used a fixed size (1009) hashtable so as the number entries grew, the performance became O(n). The fixed size can be changed using -XX:StringTableSize=N
. Apparently, in Java8 the default size is larger but issue remains.
Most likely reason for the performance difference: String.intern()
is a native method, and calling a native method incurs massive overhead.
So why is it a native method? Probably because it uses the constant pool, which is a low-level VM construct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With