Lots of people talk about the performance advantages of String.intern(), but I'm actually more interested in what the performance penalty may be.
My main concerns are:
I am concerned about all these things because I'm currently working on a financial application that has a problem of using too much memory because of duplicated Strings. Some strings basically look like enumerated values and can only have a limited number of potential values (such as currency names ("USD", "EUR")) exist in more than a million copies. String.intern() seems like a no-brainer in this case, but I'm worried about the synchronization overhead of calling intern() everytime I store a currency somewhere.
On top of that, some other types of strings can have millions of different values, but still have tens of thousands of copies of each (such as ISIN codes). For these, I'm concerned that interning a million string would basically slow down the intern() method so much as to bog down my application.
The method intern() creates an exact copy of a String object in the heap memory and stores it in the String constant pool. Note that, if another String with the same contents exists in the String constant pool, then a new object won't be created and the new reference will point to the other String.
String Interning is a method of storing only one copy of each distinct String Value, which must be immutable. By applying String. intern() on a couple of strings will ensure that all strings having the same contents share the same memory.
The intern() method creates an exact copy of a string that is present in the heap memory and stores it in the String constant pool if not already present. If the string is already present, it returns the reference. The intern() method helps to save memory space and reuse it efficiently at the cost of time.
The Intern method uses the intern pool to search for a string equal to the value of str . If such a string exists, its reference in the intern pool is returned. If the string does not exist, a reference to str is added to the intern pool, then that reference is returned.
In contrast, a conversion which involves String.format (“%d”) has the worst performance. That's logical because parsing the format String is an expensive operation. 3.4. Comparing Strings Let's evaluate different ways of comparing Strings.
String deduplication improves performance in large, multi-threaded applications. But overusing String.intern () may cause serious memory leaks, slowing down the application for splitting the strings we should use indexOf () to win in performance. However, in some noncritical cases String.split () function might be a good fit
string.Intern method also returns the string which interned string it refers to. The major difference between string.IsInterned and string.Intern is that the first one returns a null value if that string is not interned while the latter (string.Intern) creates a new entry in the intern pool and returns that reference.
java.lang.String#intern () is an interesting function in Java. When used at the right place, it has potential to reduce overall memory consumption of your application by eliminating duplicate strings in your application. To learn how intern () function works, you may refer to this blog.
I did a little bit of benchmarking myself. For the search cost part, I've decided to compare String.intern() with ConcurrentHashMap.putIfAbsent(s,s). Basically, those two methods do the same things, except String.intern() is a native method that stores and read from a SymbolTable that is managed directly in the JVM, and ConcurrentHashMap.putIfAbsent() is just a normal instance method.
You can find the benchmark code on github gist (for a lack of a better place to put it). You can also find the options I used when launching the JVM (to verify that the benchmark is not skewed) in the comments at the top of the source file.
Anyway here are the results:
Legend
String.intern()
count initial intern lookup same string lookup equal string 1'000'000 40206 34698 35000 400'000 5198 4481 4477 200'000 955 828 803 100'000 234 215 220 80'000 110 94 99 40'000 52 30 32 20'000 20 10 13 10'000 7 5 7
ConcurrentHashMap.putIfAbsent()
count initial intern lookup same string lookup equal string 1'000'000 411 246 309 800'000 352 194 229 400'000 162 95 114 200'000 78 50 55 100'000 41 28 28 80'000 31 23 22 40'000 20 14 16 20'000 12 6 7 10'000 9 5 3
The conclusion for the search cost: String.intern() is surprisingly expensive to call. It scales extremely badly, in something of O(n) where n is the number of strings in the pool. When the number of strings in the pool grows, the amount of time to lookup one string from the pool grows much more (0.7 microsecond per lookup with 10'000 strings, 40 microseconds per lookup with 1'000'000 strings).
ConcurrentHashMap scales as expected, the number of strings in the pool has no impact on the speed of the lookup.
Based on this experiment, I'd strongly suggest avoiding to use String.intern() if you are going to intern more than a few strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With