Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is it beneficial to flyweight Strings in Java?

I understand the basic idea of java's String interning, but I'm trying to figure out which situations it happens in, and which I would need to do my own flyweighting.

Somewhat related:

  • Java Strings: “String s = new String(”silly“);”
  • The best alternative for String flyweight implementation in Java never quite got answered

Together they tell me that String s = "foo" is good and String s = new String("foo") is bad but there's no mention of any other situations.

In particular, if I parse a file (say a csv) that has a lot of repeated values, will Java's string interning cover me or do I need to do something myself? I've gotten conflicting advice about whether or not String interning applies here in my other question


The full answer came in several fragments, so I'll sum up here:

By default, java only interns strings that are known at compile-time. String.intern(String) can be used at runtime, but it doesn't perform very well, so it's only appropriate for smaller numbers of Strings that you're sure will be repeated a lot. For larger sets of Strings it's Guava to the rescue (see ColinD's answer).

like image 832
Brad Mace Avatar asked Oct 19 '10 21:10

Brad Mace


Video Answer


2 Answers

One option Guava gives you here is to use an Interner rather than using String.intern(). Unlike String.intern(), a Guava Interner uses the heap rather than the permanent generation. Additionally, you have the option of interning the Strings with weak references such that when you're done using those Strings, the Interner won't prevent them from being garbage-collected. If you use the Interner in such a way that it's discarded when you're done with the strings, though, you can just use strong references with Interners.newStrongInterner() instead for possibly better performance.

Interner<String> interner = Interners.newWeakInterner();
String a = interner.intern(getStringFromCsv());
String b = interner.intern(getStringFromCsv());
// if a.equals(b), a == b will be true
like image 80
ColinD Avatar answered Sep 23 '22 13:09

ColinD


Don't use String.intern() in your code. At least not if you might get 20 or more different strings. In my experience using String.intern slows down the whole application when you have a few millions strings.

To avoid duplicated String objects, just use a HashMap.

private final Map<String, String> pool = new HashMap<String, String>();

private void interned(String s) {
  String interned = pool.get(s);
  if (interned != null) {
    return interned;
  pool.put(s, s);
  return s;
}

private void readFile(CsvFile csvFile) {
  for (List<String> row : csvFile) {
    for (int i = 0; i < row.size(); i++) {
      row.set(i, interned(row.get(i)));
      // further process the row
    }
  }
  pool.clear(); // allow the garbage collector to clean up
}

With that code you can avoid duplicate strings for one CSV file. If you need to avoid them on a larger scale, call pool.clear() in another place.

like image 34
Roland Illig Avatar answered Sep 23 '22 13:09

Roland Illig