Performance penalty of String.intern()

Tags:

Lots of people talk about the performance advantages of String.intern(), but I'm actually more interested in what the performance penalty may be.

My main concerns are:

Search cost: The time that intern() takes to figure out if the internable string exists in the constants pool. How does that cost scale with the number of strings in that pool?
Synchronization: obviously the constant pool is shared by the whole JVM. How does that pool behave when intern() is being called over and over from multiple threads? How much locking does it perform? How does the performance scale with contention?

I am concerned about all these things because I'm currently working on a financial application that has a problem of using too much memory because of duplicated Strings. Some strings basically look like enumerated values and can only have a limited number of potential values (such as currency names ("USD", "EUR")) exist in more than a million copies. String.intern() seems like a no-brainer in this case, but I'm worried about the synchronization overhead of calling intern() everytime I store a currency somewhere.

On top of that, some other types of strings can have millions of different values, but still have tens of thousands of copies of each (such as ISIN codes). For these, I'm concerned that interning a million string would basically slow down the intern() method so much as to bog down my application.

641

asked May 16 '12 18:05

LordOfThePigs

1 Answers

I did a little bit of benchmarking myself. For the search cost part, I've decided to compare String.intern() with ConcurrentHashMap.putIfAbsent(s,s). Basically, those two methods do the same things, except String.intern() is a native method that stores and read from a SymbolTable that is managed directly in the JVM, and ConcurrentHashMap.putIfAbsent() is just a normal instance method.

You can find the benchmark code on github gist (for a lack of a better place to put it). You can also find the options I used when launching the JVM (to verify that the benchmark is not skewed) in the comments at the top of the source file.

Anyway here are the results:

Search cost (single threaded)

Legend

count: the number of distinct strings that we are trying to pool
initial intern: the time in ms it took to insert all the strings in the string pool
lookup same string: the time in ms it took to lookup each of the strings again from the pool, using exactly the same instance as was previously entered in the pool
lookup equal string: the time in ms it took to lookup each of the strings again from the pool, but using a different instance

String.intern()

count       initial intern   lookup same string  lookup equal string 1'000'000            40206                34698                35000   400'000             5198                 4481                 4477   200'000              955                  828                  803   100'000              234                  215                  220    80'000              110                   94                   99    40'000               52                   30                   32    20'000               20                   10                   13    10'000                7                    5                    7

ConcurrentHashMap.putIfAbsent()

count       initial intern   lookup same string  lookup equal string 1'000'000              411                  246                  309   800'000              352                  194                  229   400'000              162                   95                  114   200'000               78                   50                   55   100'000               41                   28                   28    80'000               31                   23                   22    40'000               20                   14                   16    20'000               12                    6                    7    10'000                9                    5                    3

The conclusion for the search cost: String.intern() is surprisingly expensive to call. It scales extremely badly, in something of O(n) where n is the number of strings in the pool. When the number of strings in the pool grows, the amount of time to lookup one string from the pool grows much more (0.7 microsecond per lookup with 10'000 strings, 40 microseconds per lookup with 1'000'000 strings).

ConcurrentHashMap scales as expected, the number of strings in the pool has no impact on the speed of the lookup.

Based on this experiment, I'd strongly suggest avoiding to use String.intern() if you are going to intern more than a few strings.

180

answered Sep 21 '22 09:09

LordOfThePigs

Related questions
                            
                                Intercept and retry call by means of OkHttp Interceptors
                            
                                BUG! exception in phase 'semantic analysis'
                            
                                Explicit vs implicit call of toString
                            
                                Java date format to JavaScript date format
                            
                                How do you "empty" a StringWriter in Java?
                            
                                String.replaceAll(regex) makes the same replacement twice
                            
                                java.lang.NullPointerException: Attempt to invoke virtual method on a null object reference [duplicate]
                            
                                Difference between registerGlobal(), configure(), configureGlobal(),configureGlobalSecurity in Spring security
                            
                                Java: Subtract '0' from char to get an int... why does this work?
                            
                                Why there is no getFirst(iterable) method?
                            
                                Why Thread.sleep is bad to use
                            
                                What is the Difference between ArrayBlockingQueue and LinkedBlockingQueue
                            
                                Not annotated method overrides method annotated with @NotNull
                            
                                Nested synchronized keyword
                            
                                How is length implemented in Java Arrays?
                            
                                Spring Property Injection in a final attribute @Value - Java
                            
                                Why isn't there a java.lang.Array class? If a java array is an Object, shouldn't it extend Object?
                            
                                Jackson Json and no such method errors
                            
                                Thread safe Hash Map?
                            
                                JTabbedPane ChangeListener

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance penalty of String.intern()

Tags:

java

performance

string

LordOfThePigs

People also ask

1 Answers

Search cost (single threaded)

LordOfThePigs

Recent Activity

Donate For Us