Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I reclaim memory after parsing via substrings? intern() or new String()?

Tags:

java

memory

Short version: If you call string.substring(n,m).intern(), does the string table retain the substring or the original string?

...But I'm not sure that's the right question to ask, so here's the long version:

I'm working with legacy Java code (PCGen) that parses files by slurping each in as one big string and then using String.split, .trim, .substring, and StringTokenizer to decompose them into tokens. This is very efficient for parsing, because none of those methods copy the original string, but all point at parts of a shared char[].

After parsing is over, I want to reclaim some memory. Only a few small substrings of the original big string are needed, but the strong reference keeps the big string from being collected. And later I'm suffering OOM, I believe due in part to that huge heap impact of lots of parsed files.

I know I can trim the big string down via new String(String)(copy-on-write). And I know I can reduce string duplication via String.intern (which is important because there's a lot of redundancy in the parsed files). Do I need to use both to reclaim the greatest quantity of heap, or does .intern() do both? Reading the OpenJDK7 hotspot source code (hotspot/src/share/vm/classfile/symbolTable.cpp) it looks like the string table keeps the whole string and does not trim it for offset/length at all. So I think I need to make a new String and then intern that result. Right?

All that said, switching to a streaming parser would be a big win in terms of memory, but that's too big a change for the short term.

like image 225
Chris Dolan Avatar asked Jan 25 '13 06:01

Chris Dolan


People also ask

What does string intern () method do?

The method intern() creates an exact copy of a String object in the heap memory and stores it in the String constant pool. Note that, if another String with the same contents exists in the String constant pool, then a new object won't be created and the new reference will point to the other String.

What is String intern () When and why should it be used?

String Interning is a method of storing only one copy of each distinct String Value, which must be immutable. By applying String. intern() on a couple of strings will ensure that all strings having the same contents share the same memory.

Which method returns the existing string from the memory?

Explanation: The intern() method is used to return the existing strings from the database.

What is the use of the intern () method hard?

The intern() method creates an exact copy of a string that is present in the heap memory and stores it in the String constant pool if not already present. If the string is already present, it returns the reference.


2 Answers

You can use new String(String) and the intern() method and this will take a copy as required for up to Java 7 update 4. From Java 7 update 5 substring will take a deeper copy, but you may still want to use intern(). Note: Java 7 uses the heap, not the perm gen to store String literals.

public static void main(String[] args) {
    char[] chars = new char[128];
    Arrays.fill(chars, 'A');
    String a128 = new String(chars);
    printValueFor("a128", a128);
    String a16 = a128.substring(0, 16);
    printValueFor("a16", a16);
}

public static void printValueFor(String desc, String s) {
    try {
        Field value = String.class.getDeclaredField("value");
        value.setAccessible(true);
        char[] valueArr = (char[]) value.get(s);
        System.out.println(desc + ": " + Integer.toHexString(System.identityHashCode(valueArr)) + ", len=" + valueArr.length);
    } catch (Exception e) {
        throw new AssertionError(e);
    }
}

on Java 7 update 4 prints

a128: 513e86ec, len=128
a16: 53281264, len=16

I would expect that Java 6 does not do this.

like image 200
Peter Lawrey Avatar answered Oct 20 '22 05:10

Peter Lawrey


We can test it. String holds its character array in a field

   private final char value[];

let's see what happens after substring(); intern();

    Field f = String.class.getDeclaredField("value");
    f.setAccessible(true);
    String s1 = "12345";
    String s2 = s1.substring(1, 2);
    String s3 = s2.intern();
    System.out.println(f.get(s2) == f.get(s1));
    System.out.println(f.get(s3) == f.get(s2));

output

true
true

that is, all 3 strings share the same character array

like image 35
Evgeniy Dorofeev Avatar answered Oct 20 '22 05:10

Evgeniy Dorofeev