I understand the basic idea of java's String interning, but I'm trying to figure out which situations it happens in, and which I would need to do my own flyweighting.
Somewhat related:
Together they tell me that String s = "foo"
is good and String s = new String("foo")
is bad but there's no mention of any other situations.
In particular, if I parse a file (say a csv) that has a lot of repeated values, will Java's string interning cover me or do I need to do something myself? I've gotten conflicting advice about whether or not String interning applies here in my other question
The full answer came in several fragments, so I'll sum up here:
By default, java only interns strings that are known at compile-time. String.intern(String)
can be used at runtime, but it doesn't perform very well, so it's only appropriate for smaller numbers of String
s that you're sure will be repeated a lot. For larger sets of Strings it's Guava to the rescue (see ColinD's answer).
One option Guava gives you here is to use an Interner rather than using String.intern()
. Unlike String.intern()
, a Guava Interner
uses the heap rather than the permanent generation. Additionally, you have the option of interning the String
s with weak references such that when you're done using those String
s, the Interner
won't prevent them from being garbage-collected. If you use the Interner
in such a way that it's discarded when you're done with the strings, though, you can just use strong references with Interners.newStrongInterner()
instead for possibly better performance.
Interner<String> interner = Interners.newWeakInterner();
String a = interner.intern(getStringFromCsv());
String b = interner.intern(getStringFromCsv());
// if a.equals(b), a == b will be true
Don't use String.intern() in your code. At least not if you might get 20 or more different strings. In my experience using String.intern
slows down the whole application when you have a few millions strings.
To avoid duplicated String
objects, just use a HashMap
.
private final Map<String, String> pool = new HashMap<String, String>();
private void interned(String s) {
String interned = pool.get(s);
if (interned != null) {
return interned;
pool.put(s, s);
return s;
}
private void readFile(CsvFile csvFile) {
for (List<String> row : csvFile) {
for (int i = 0; i < row.size(); i++) {
row.set(i, interned(row.get(i)));
// further process the row
}
}
pool.clear(); // allow the garbage collector to clean up
}
With that code you can avoid duplicate strings for one CSV file. If you need to avoid them on a larger scale, call pool.clear()
in another place.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With