I'm reading from a binary file and want to convert the bytes to US ASCII strings. Is there any way to do this without calling new
on String
to avoid multiple semantically equal String
objects being created in the string literal pool? I'm thinking that it is probably not possible since introducing String
objects using double quotes is not possible here. Is this correct?
private String nextString(DataInputStream dis, int size)
throws IOException
{
byte[] bytesHolder = new byte[size];
dis.read(bytesHolder);
return new String(bytesHolder, Charset.forName("US-ASCII")).trim();
By new keyword : Java String is created by using a keyword “new”. For example: String s=new String(“Welcome”); It creates two objects (in String pool and in heap) and one reference variable where the variable 's' will refer to the object in the heap.
The BigInteger class has a longValue() method to convert a byte array to a long value: long value = new BigInteger(bytes).
First, the byte is converted to an int via widening primitive conversion (§5.1. 2), and then the resulting int is converted to a char by narrowing primitive conversion (§5.1. 3).
You'd have to have a cache mapping byte arrays to strings, then search through the cache for any equal values before creating a new string.
You can intern existing strings with intern()
as Yishai posted - that won't stop you from creating more strings, but it'll make all but the first one (for any char sequence) very short lived. On the other hand, it'll make all the distinct strings live for a very long time indeed.
You can have "pseudo-interning" by using a Map<String, String>
:
String tmp = new String(bytesHolder, Charset.forName("US-ASCII")).trim();
String cached = cache.get(tmp);
if (cached == null)
{
cached = tmp;
cache.put(tmp, tmp);
}
return cached;
You could even put a bit more effort in and end up with an LRU cache so that it'll keep the N most recently fetched strings, discarding others when it needs to.
None of that reduces the number of strings created in the first place, as I say - but is that likely to be a problem in your situation? GCs have been tuned to make it very cheap to create short-lived objects.
You can call the intern() method on the string to ensure one for the whole JVM.
String s = new String(bytes, "US-ASCII").intern();
You won't avoid creating the initial string again, but you will save on the storage.
That being said, interned strings have a limited storage space, so use with caution. A better option may be to implement a HashMap with the string as the key and value and check if the string already exists and get it if it does, insert it if it doesn't. That way you won't have such memory limitations.
You shouldn’t be concerned about it—unless you profiled your application and have determined the String
creation to be the exact source of your problem.
If you find out that the String
creation is the source of your problem I would recommend what Jon Skeet proposed, i.e. a mapping from byte[]
to String
. That has about the same effect as interning your String
s while not hogging up valuable memory until you restart the VM.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With