When will the new String() object in memory gets cleared after invoking intern() method

Tags:

List<String> list = new ArrayList<>();
for (int i = 0; i < 1000; i++)
{
    StringBuilder sb = new StringBuilder();
    String string = sb.toString();
    string = string.intern()
    list.add(string);
}

In the above sample, after invoking string.intern() method, when will the 1000 objects created in heap (sb.toString) be cleared?

Edit 1: If there is no guarantee that these objects could be cleared. Assuming that GC haven't run, is it obsolete to use string.intern() itself? (In terms of the memory usage?)

Is there any way to reduce memory usage / object creation while using intern() method?

228

asked Jan 05 '18 12:01

Gokul Raj Kumar

1 Answers

Your example is a bit odd, as it creates 1000 empty strings. If you want to get such a list with consuming minimum memory, you should use

List<String> list = Collections.nCopies(1000, "");

instead.

If we assume that there is something more sophisticated going on, not creating the same string in every iteration, well, then there is no benefit in calling intern(). What will happen, is implementation dependent. But when calling intern() on a string that is not in the pool, it will be just added to the pool in the best case, but in the worst case, another copy will be made and added to the pool.

At this point, we have no savings yet, but potentially created additional garbage.

Interning at this point can only save you some memory, if there are duplicates somewhere. This implies that you construct duplicate strings first, to look up their canonical instance via intern() afterwards, so having the duplicate string in memory until garbage collected, is unavoidable. But that’s not the real problem with interning:

in older JVMs, there was special treatment of interned string that could result in worse garbage collection performance or even running out of resources (i.e. the fixed size “PermGen” space).
in HotSpot, the string pool holding the interned strings is a fixed size hash table, yielding hash collisions, hence, poor performance, when referencing significantly more strings than the table size.
Before Java 7, update 40, the default size was about 1,000, not even sufficient to hold all string constants for any nontrivial application without hash collisions, not to speak of manually added strings. Later versions use a default size of about 60,000, which is better, but still a fixed size that should discourage you from adding an arbitrary number of strings
the string pool has to obey inter-thread semantics mandated by the language specification (as it is used to for string literals), hence, need to perform thread safe updates that can degrade the performance

Keep in mind that you pay the price of the disadvantages named above, even in the cases that there are no duplicates, i.e. there is no space saving. Also, the acquired reference to the canonical string has to have a much longer lifetime than the temporary object used to look it up, to have any positive effect on the memory consumption.

The latter touches your literal question. The temporary instances are reclaimed when the garbage collector runs the next time, which will be when the memory is actually needed. There is no need to worry about when this will happen, but well, yes, up to that point, acquiring a canonical reference had no positive effect, not only because the memory hasn’t been reused up to that point, but also, because the memory was not actually needed until then.

This is the place to mention the new String Deduplication feature. This does not change string instances, i.e. the identity of these objects, as that would change the semantic of the program, but change identical strings to use the same char[] array. Since these character arrays are the biggest payload, this still may achieve great memory savings, without the performance disadvantages of using intern(). Since this deduplication is done by the garbage collector, it will only applied to strings that survived long enough to make a difference. Also, this implies that it will not waste CPU cycles when there still is plenty of free memory.

However, there might be cases, where manual canonicalization might be justified. Imagine, we’re parsing a source code file or XML file, or importing strings from an external source (Reader or data base) where such canonicalization will not happen by default, but duplicates may occur with a certain likelihood. If we plan to keep the data for further processing for a longer time, we might want to get rid of duplicate string instances.

In this case, one of the best approaches is to use a local map, not being subject to thread synchronization, dropping it after the process, to avoid keeping references longer than necessary, without having to use special interaction with the garbage collector. This implies that occurrences of the same strings within different data sources are not canonicalized (but still being subject to the JVM’s String Deduplication), but it’s a reasonable trade-off. By using an ordinary resizable HashMap, we also do not have the issues of the fixed intern table.

E.g.

static List<String> parse(CharSequence input) {
    List<String> result = new ArrayList<>();

    Matcher m = TOKEN_PATTERN.matcher(input);
    CharBuffer cb = CharBuffer.wrap(input);
    HashMap<CharSequence,String> cache = new HashMap<>();
    while(m.find()) {
        result.add(
            cache.computeIfAbsent(cb.subSequence(m.start(), m.end()), Object::toString));
    }
    return result;
}

Note the use of the CharBuffer here: it wraps the input sequence and its subSequence method returns another wrapper with different start and end index, implementing the right equals and hashCode method for our HashMap, and computeIfAbsent will only invoke the toString method, if the key was not present in the map before. So, unlike using intern(), no String instance will be created for already encountered strings, saving the most expensive aspect of it, the copying of the character arrays.

If we have a really high likelihood of duplicates, we may even save the creation of wrapper instances:

static List<String> parse(CharSequence input) {
    List<String> result = new ArrayList<>();

    Matcher m = TOKEN_PATTERN.matcher(input);
    CharBuffer cb = CharBuffer.wrap(input);
    HashMap<CharSequence,String> cache = new HashMap<>();
    while(m.find()) {
        cb.limit(m.end()).position(m.start());
        String s = cache.get(cb);
        if(s == null) {
            s = cb.toString();
            cache.put(CharBuffer.wrap(s), s);
        }
        result.add(s);
    }
    return result;
}

This creates only one wrapper per unique string, but also has to perform one additional hash lookup for each unique string when putting. Since the creation of a wrapper is quiet cheap, you really need a significantly large number of duplicate strings, i.e. small number of unique strings compared to the total number, to have a benefit from this trade-off.

As said, these approaches are very efficient, because they use a purely local cache that is just dropped afterwards. With this, we don’t have to deal with thread safety nor interact with the JVM or garbage collector in a special way.

115

answered Nov 14 '22 21:11

Holger

Related questions
                            
                                AspectJ - pointcut to match a method that has generic parameters
                            
                                Android Studio - Failed to create keystore
                            
                                JNI DETECTED ERROR IN APPLICATION: the return type of CallObjectMethodV does not match void android.media.AudioTrack.play()
                            
                                Use Case : Collection.singletonList Vs Collection.unmodifiableList
                            
                                SunPKCS11 provider not found with keytool
                            
                                Flink error on using RichAggregateFunction
                            
                                Assert - 2 Exact Same String Comparing Returns Error
                            
                                Failure to run a jmh test under java9
                            
                                Designing java project for monoliths and microservices at same time
                            
                                Java - How split(regex, limit) method actually works? [duplicate]
                            
                                jersey/tomcat Description The origin server did not find a current representation for the target resource
                            
                                Knapsack but exact weight
                            
                                Countdown timer in recyclerview not working properly
                            
                                Where are the three test containers coming from?
                            
                                java.sql.Timestamp wrong time parsing
                            
                                RxJava: combine two optional observables
                            
                                Add a first value to spinner when spinner data is came from server
                            
                                Is there an equivalent of the Java <? extends ClassName> in C++?
                            
                                How can I access the application using selenium webdriver that works through VPN?
                            
                                How does Serialized RDD occupy less space in memory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When will the new String() object in memory gets cleared after invoking intern() method

Tags:

java

memory-management

heap-memory

memory

garbage-collection

Gokul Raj Kumar

People also ask

1 Answers

Holger

Recent Activity

Donate For Us