Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is directly accessing the backing array of a String justified in some cases?

I'm working on optimizing text processing software, in which the following class is used a lot:

class Sentence {

  private final char[] textArray;
  private final String textString; 

  public Sentence(String text) {
     this.textArray = text.toCharArray();
     this.textString = text;
  }

  public String getString() {
     return textString;
  }

  public char[] getArray() {
     return textArray;
  } 
}

As you can see, there is some redundancy: the backing array of the textString is always equal to the textArray, yet both are stored.

I'm hoping to reduce the memory footprint of this class, by getting rid of the textArray field.

There is one problem : this class is used widely througout out codebase, thus I cannot get rid of getArray() method. My solution is to get rid of the textArray field, and let the getArray() method return the textSting's backing array instead via reflection.

The result would be something like:

class Sentence {

  private final String textString; 

  public Sentence(String text) {
       this.textString = text;
  }

  public String getString() {
     return textString;
  }

  public char[] getArray() {
     return getBackingArrayUsingReflection(textString);
  } 
}

It seems like a viable solution, but I suspect a String's backing array is private for a reason. What are potential problems with this approach?

like image 796
ChrisBlom Avatar asked Jan 15 '23 17:01

ChrisBlom


2 Answers

One thing that will happen is that you are committing yourself to one specific implementation of the JDK. For example, Java 7 Update 6 has totally revamped its use of the char[]. This is why such an approach should be tolerated only if your code is very ephemeral, basically throw-away code.

If you are only reading the char[], and you are coding for OpenJDK Java 7, Update 6, you won't introduce any bugs.

On the other hand, 95% of Java programmers around the world would probably shake their heads in disbelief at code that reflects upon String internals, so be careful :)

like image 170
Marko Topolnik Avatar answered Jan 22 '23 05:01

Marko Topolnik


Depending on the version of java.lang.String (Java 7 Update 5 and earlier), it uses a backing array, and the begin index and length (count) of the actual string in that array. In these implementations of Java the backing array can be (substantially) longer than the actual string and the string does not necessarily start at the start of the array.

For example when you use substring, the backing array could be identical to the backing array of the original String, but just with different start index and character count. So using reflection to return the backing array of the String doesn't work in all cases (or: it will result in incorrect/unexpected behavior).

See for example http://www.docjar.com/html/api/java/lang/String.java.html String substring(int beginIndex, int endIndex) on line 1950 (and below), which calls the constructor String(int offset, int count, char value[]) on line 645 (and below). Here the char[] is directly used as the backing array, and offset and count are used as the offset into the array and the length of the string:

public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > count) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    if (beginIndex > endIndex) {
        throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
    }
    return ((beginIndex == 0) && (endIndex == count)) ? this :
        new String(offset + beginIndex, endIndex - beginIndex, value);
}

// Package private constructor which shares value array for speed.
String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
}

As indicated by Marko Topolnik, this is no longer the case with more recent versions of Java 7. You should not depend on the implementation details of Java (especially as it can change substantially between versions - as demonstrated).

like image 39
Mark Rotteveel Avatar answered Jan 22 '23 07:01

Mark Rotteveel