Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is string.intern() so slow?

Before anyone questions the fact of using string.intern() at all, let me say that I need it in my particular application for memory and performance reasons. [1]

So, until now I used String.intern() and assumed it was the most efficient way to do it. However, I noticed since ages it is a bottleneck in the software. [2]

Then, just recently, I tried to replace the String.intern() by a huge map where I put/get the strings in order to obtain each time a unique instance. I expected this would be slower... but it was exactly the opposite! It was tremendously faster! Replacing the intern() by pushing/polling a map (which achieves exactly the same) resulted in more than one order of magnitude faster.

The question is: why is intern() so slow?!? Why isn't it then simply backed up by a map (or actually, just a customized set) and would be tremendously faster? I'm puzzled.

[1]: For the unconvinced ones: It is in natural language processing and has to process gigabytes of text, therefore needs to avoid many instances of a same string to avoid blowing up the memory and referential string comparison to be fast enough.

[2]: without it (normal strings) it is impossible, with it, this particular step remains the most computation intensive one

EDIT:

Due to the surprising interest in this post, here is some code to test it out:

http://pastebin.com/4CD8ac69

And the results of interning a bit more than 1 million strings:

  • HashMap: 4 seconds
  • String.intern(): 54 seconds

Due to avoid some warm-up / OS IO caching and stuff like this, the experiment was repeated by inverting the order of both benchmarks:

  • String.intern(): 69 seconds
  • HashMap: 3 seconds

As you see, the difference is very noticeable, more than tenfolds. (Using OpenJDK 1.6.0_22 64bits ...but using the sun one resulted in similar results I think)

like image 720
dagnelies Avatar asked Aug 31 '11 21:08

dagnelies


People also ask

What is intern () in string?

The method intern() creates an exact copy of a String object in the heap memory and stores it in the String constant pool. Note that, if another String with the same contents exists in the String constant pool, then a new object won't be created and the new reference will point to the other String.

What is String intern () When and why should it be used?

String Interning is a method of storing only one copy of each distinct String Value, which must be immutable. By applying String. intern() on a couple of strings will ensure that all strings having the same contents share the same memory.

What is the use of the intern () method?

The intern() method creates an exact copy of a string that is present in the heap memory and stores it in the String constant pool if not already present. If the string is already present, it returns the reference. The intern() method helps to save memory space and reuse it efficiently at the cost of time.

What is the use of the intern () method hard?

The Java String class intern() method returns the interned string. It returns the canonical representation of string. It can be used to return string from memory if it is created by a new keyword. It creates an exact copy of the heap string object in the String Constant Pool.


Video Answer


2 Answers

This article discusses the implementation of String.intern(). In Java 6 and 7, the implementation used a fixed size (1009) hashtable so as the number entries grew, the performance became O(n). The fixed size can be changed using -XX:StringTableSize=N. Apparently, in Java8 the default size is larger but issue remains.

like image 168
Martin Serrano Avatar answered Oct 14 '22 15:10

Martin Serrano


Most likely reason for the performance difference: String.intern() is a native method, and calling a native method incurs massive overhead.

So why is it a native method? Probably because it uses the constant pool, which is a low-level VM construct.

like image 37
Michael Borgwardt Avatar answered Oct 14 '22 16:10

Michael Borgwardt