Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to make the Jruby runtime intern all strings?

We have a java/jruby webapp running under tomcat, and I have been analyzing the number of objects and memory use by the app during runtime. I have noticed after startup the class "org.jruby.RubyString" had 1,118,000 instances of the string "", the total amount of heap memory used by empty strings alone is 65mb, this to me is ridiculous because it is 15% of the memory used by the webapp. The empty string is only one example of many string values with this problem, if I can intern all the jruby strings I worked out I could save about 130mb.

I know in Java, each time when a string value is created, it will check if the value already exists in the string pool and reuse it if it does. I am wondering if there is an option in Jruby that has the same optimization? if so, how do I enable it?

Example in Jruby:

v1 = "a"
v2 = "a"
puts v1.object_id # => 3352
puts v2.object_id # => 3354

Example in Java:

String v1 = "a";
String v2 = "a";

System.out.println(v1.hashCode()); # => 97
System.out.println(v2.hashCode()); # => 97
like image 814
Chiwai Chan Avatar asked May 24 '12 23:05

Chiwai Chan


2 Answers

I understand the motivation behind this, but there's really no such "magic" switch in JRuby ...

From a Java background it feels temping to save on strings, but you can't expect strings to behave the same way in JRuby as they do in Java. First of all they're a completely different object. I would go as far as to say that a Ruby String is more of a Java StringBuilder.

It's certainly a waste to have so many "" instances lying around, but if that code as you mention is third-party code there's not much you can do about it - unless you feel like monkey patching a lot. I would try to identify the places most of the instances come from and refactor those - but remember there are some "tricky" parts on saving strings e.g. with Hash:

{ 'foo' => 'bar' }

You would guess this creates 3 objects, but you'd be wrong; it actually creates two of the 'foo'. Since a String is mutable (unless frozen?) it dups the string and freezes when used as a Hash key (and there's a good reason for that).

Also keep in mind to refactor "intelligently" - profile the bits you're changing if you do not slow things down by trying to get cheap on instances allocated.

like image 60
kares Avatar answered Nov 09 '22 10:11

kares


v1 = v2 = v3 = "a"

Will only create one object in Ruby, not three.

v1 = v2 = v3 = "a" # => "a"
v1.object_id # => 10530560
v2.object_id # => 10530560
v1 << "ll the same" # => "all the same"
v2 # "all the same"

Before doing something as drastic as interning all the strings, I'd check with other tomcat users if this is the best way of dealing with this problem. I don't use Tomcat, or JRuby, but I strongly suspect this isn't the best approach.

Edit If every object that was built from an "a" was the same object, then modifying one of them would modify all of the other strings. That would be a side effect nightmare.

like image 23
Andrew Grimm Avatar answered Nov 09 '22 10:11

Andrew Grimm