Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to create a new String from a subset of another String without any data copy?

Tags:

java

string

I had (apparenty an incorrect?) impression that Java substring(srcArray, startIndex, endIndex) method did not allocate new memory but instead re-used the existing underlying char[] array. This approach would seem to be possible due to the immutability of String's.

However when actually looking at the JDK source we find the following:

public String substring(int beginIndex, int endIndex) {

    return ((beginIndex == 0) && (endIndex == value.length)) ? this
            : new String(value, beginIndex, subLen);
}

public String(char value[], int offset, int count) {


    this.value = Arrays.copyOfRange(value, offset, offset+count);

}

Notice the copyOfRange which copies/duplicates the content.

The motivation for this was working with very large arrays representing the backing store of matrices for linear algebra. There is NO way we are going to accept copying them. I would like however to have a custom representation of portions of the backing array for convenience in programming. It would however only be reasonable if no data copy were to be performed. The tie-in with String is that I looked into how the String handles no-data-copy and found that they (in later java versions apparently..) do actually perform the copy.

So, is there actually any way to achieve the zero-copy?

like image 381
WestCoastProjects Avatar asked Oct 07 '14 22:10

WestCoastProjects


People also ask

Why does the copy method not create a new string?

In particular, because of changes in string interning in .NET Core 3.0, in some cases the Copy method will not create a new string but will simply return a reference to an existing interned string. Depending on Why you want to call the Copy method, there are a number of alternatives:

What happens when you copy a string to another string?

A new string with the same value as str. str is null. The Copy method returns a String object that has the same value as the original string but represents a different object reference. It differs from an assignment operation, which assigns an existing string reference to an additional object variable.

How to find if one string contains characters subset of another string?

We basically need to find if one string contains characters which are subset of characters in second string. First we count occurrences of all characters in second string. Then we traverse through first string and reduce count of every character that is present in first. If at any moment, count becomes less than 0, we return false.

Is creating a new string from a substring redundant?

So creating a new string from a substring operation (which is itself a new string) was indeed redundant, as per the preceding diagram (except for one case, see below).


2 Answers

It used to be that way, but it was changed; the designers of Java came to the conclusion that it caused more problems than it solved, specifically due to a) the risk of memory leaks, b) the fact that most substring calls are for relatively small ranges that are cheap to copy.

The closest thing you can do, to get a CharSequence view of a subrange of the String without copying, you can do by writing CharBuffer.wrap(string).subSequence(from, to).

like image 185
Louis Wasserman Avatar answered Sep 21 '22 19:09

Louis Wasserman


The class java.nio.StringCharBuffer implements such a "view" on a substring without copying of the underlying char[] array of the String. It uses the String.charAt functions along with an internal offset (number of chars offset from the start of the String) in order to access the String array.

Unfortunately, it is package-private and not usable directly. The superclass java.nio.CharBuffer however is public, and you could implement your own implementation of a StringCharBuffer using the same approach that the java.nio.StringCharBuffer takes.

Note that if you do not do this and simply use CharBuffer.wrap as suggested by another answer you will still copy the array once.

Alternatively, you might just require a Stream of chars beginning from a certain index. In that case, you can use something like:

IntStream
  .range(start, str.length())
  .map(i -> str.charAt(i))
  ...

to get a Stream of chars. Note that this is different from doing str.chars().skip(start), as the latter will actually advance the cursor by 1 start times.

like image 31
jmiserez Avatar answered Sep 20 '22 19:09

jmiserez