Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a memory-efficient replacement of java.lang.String?

After reading this old article measuring the memory consumption of several object types, I was amazed to see how much memory Strings use in Java:

length: 0, {class java.lang.String} size = 40 bytes length: 7, {class java.lang.String} size = 56 bytes 

While the article has some tips to minimize this, I did not find them entirely satisfying. It seems to be wasteful to use char[] for storing the data. The obvious improvement for most western languages would be to use byte[] and an encoding like UTF-8 instead, as you only need a single byte to store the most frequent characters then instead of two bytes.

Of course one could use String.getBytes("UTF-8") and new String(bytes, "UTF-8"). Even the overhead of the String instance itself would be gone. But then there you lose very handy methods like equals(), hashCode(), length(), ...

Sun has a patent on byte[] representation of Strings, as far as I can tell.

Frameworks for efficient representation of string objects in Java programming environments
... The techniques can be implemented to create Java string objects as arrays of one-byte characters when it is appropriate ...

But I failed to find an API for that patent.

Why do I care?
In most cases I don't. But I worked on applications with huge caches, containing lots of Strings, which would have benefitted from using the memory more efficiently.

Does anybody know of such an API? Or is there another way to keep your memory footprint for Strings small, even at the cost of CPU performance or uglier API?

Please don't repeat the suggestions from the above article:

  • own variant of String.intern() (possibly with SoftReferences)
  • storing a single char[] and exploiting the current String.subString(.) implementation to avoid data copying (nasty)

Update

I ran the code from the article on Sun's current JVM (1.6.0_10). It yielded the same results as in 2002.

like image 911
the.duckman Avatar asked Oct 23 '08 19:10

the.duckman


People also ask

Does String make Java more memory efficient?

Answer: String literal makes java more memory efficient.

How do I optimize Java memory?

The most obvious place to start tuning the memory footprint is the Java heap size. If you reduce the Java heap size, you reduce the memory footprint of the Java process by the same amount. You cannot reduce the heap size beyond a point; the heap should be large enough for all objects that are live at the same time.

What is the memory of String in Java?

So in total, an empty string uses 40 bytes. If the String contains, say, 19 characters, then the String object itself still requires 24 bytes.

How is memory allocated for strings in Java?

Stack space contains specific values that are short-lived whereas Heap space used by Java Runtime to allocate memory to objects and JRE classes. In Java, strings are stored in the heap area. Why Java strings stored in Heap, not in Stack? String Literal is created by using a double quote.


2 Answers

With a Little Bit of Help From the JVM...

WARNING: This solution is now obsolete in newer Java SE versions. See other ad-hoc solutions further below.

If you use an HotSpot JVM, since Java 6 update 21, you can use this command-line option:

-XX:+UseCompressedStrings 

The JVM Options page reads:

Use a byte[] for Strings which can be represented as pure ASCII. (Introduced in Java 6 Update 21 Performance Release)

UPDATE: This feature was broken in a later version and was supposed to be fixed again in Java SE 6u25 as mentioned by the 6u25 b03 release notes (however we don't see it in the 6u25 final release notes). The bug report 7016213 is not visible for security reasons. So, use with care and check first. Like any -XX option, it is deemed experimental and subject to change without much notice, so it's probably not always best to not use that in the startup scrip of a production server.

UPDATE 2013-03 (thanks to a comment by Aleksey Maximus): See this related question and its accepted answer. The option now seems to be deceased. This is further confirmed in the bug 7129417 report.

The End Justifies the Means

Warning: (Ugly) Solutions for Specific Needs

This is a bit out of the box and lower-level, but since you asked... don't hit the messenger!

Your Own Lighter String Representation

If ASCII is fine for you needs, then why don't you just roll out your own implementation?

As you mentioned, you could byte[] instead of char[] internally. But that's not all.

To do it even more lightweight, instead of wrapping your byte arrays in a class, why not simply use an helper class containing mostly static methods operating on these byte arrays that you pass around? Sure, it's going to feel pretty C-ish, but it would work, and would save you the huge overhead that goes with String objects.

And sure, it would miss some nice functionalities... unless your re-implement them. If you really need them, then there's not much choice. Thanks to OpenJDK and a lot of other good projects, you could very well roll out your own fugly LiteStrings class that just operate on byte[] parameters. You'll feel like taking a shower every time you need to call a function, but you'll have saved heaps of memory.

I'd recommend to make it resemble closely the String class's contract and to provide meaningful adapters and builders to convert from and to String, and you might want to also have adapters to and from StringBuffer and StringBuilder, as well as some mirror implementations of other things you might need. Definitely some piece of work, but might be worth it (see a bit below the "Make it Count!" section).

On-the-Fly Compression/Decompression

You could very well compress your strings in memory and decompress them on the fly when you need them. After all, you only need to be able to read them when you access them, right?

Of course, being that violent will mean:

  • more complex (thus less maintainable) code,
  • more processing power,
  • relatively long strings are needed for the compression to be relevant (or to compact multiple strings into one by implementing your own store system, to make the compression more effective).

Do Both

For a full-headache, of course you can do all of that:

  • C-ish helper class,
  • byte arrays,
  • on-the-fly compressed store.

Be sure to make that open-source. :)

Make it Count!

By the way, see this great presentation on Building Memory-Efficient Java Applications by N. Mitchell and G. Sevitsky: [2008 version], [2009 version].

From this presentation, we see that an 8-char string eats 64 bytes on a 32-bit system (96 for a 64-bit system!!), and most of it is due to JVM overhead. And from this article we see that an 8-byte array would eat "only" 24 bytes: 12 bytes of header, 8 x 1 byte + 4 bytes of alignment).

Sounds like this could be worth it if you really manipulate a lot of that stuff (and possibly speed up things a bit, as you'd spend less time allocating memory, but don't quote me on that and benchmark it; plus it would depend greatly on your implementation).

like image 77
haylem Avatar answered Oct 12 '22 00:10

haylem


At Terracotta, we have some cases where we compress big Strings as they are sent around the network and actually leave them compressed until decompression is necessary. We do this by converting the char[] to byte[], compressing the byte[], then encoding that byte[] back into the original char[]. For certain operations like hash and length, we can answer those questions without decoding the compressed string. For data like big XML strings, you can get substantial compression this way.

Moving the compressed data around the network is a definite win. Keeping it compressed is dependent on the use case. Of course, we have some knobs to turn this off and change the length at which compression turns on, etc.

This is all done with byte code instrumentation on java.lang.String which we've found is very delicate due to how early String is used in startup but is stable if you follow some guidelines.

like image 39
Alex Miller Avatar answered Oct 11 '22 23:10

Alex Miller