Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does replace do if no match is found? (under the hood)

I have very long strings that need to have a pattern removed if it appears. But it's an incredibly rare edge case for it to appear in the strings.

If I do this:

str = str.replace("pattern", "");

Then it looks like I'm creating a new string (because Java strings are immutable), which would be a waste if the original string was fine. Should I first check for a match, and then only replace if a match is found?

like image 327
ColBeseder Avatar asked Nov 05 '14 06:11

ColBeseder


1 Answers

Short answer

Checking the documentation of various implementations, none seems to require the String.replace(CharSequence, CharSequence) method to return the same string if no match is found.

Without the requirement from the documentation, the implementation may or may not optimize the method in the case no match is found. It is best to write your code as if there is no optimization, to make sure that it runs correctly on any implementation or version of JRE.

In particular, when no match is found, Oracle's implementation (version 8-b123) returns the same String object, while GNU Classpath (version 0.95) returns a new String object regardless.

If you can find any clause in any of the documentation requiring String.replace(CharSequence, CharSequence) to return the same String object when no match is found, please leave a comment.

Long answer

The long answer below is to show that different implementation may or may not optimize the case where no match is found.

Let us look at Oracle's implementation and GNU Classpath's implementation of String.replace(CharSequence, CharSequence) method.

GNU Classpath

Note: This is correct as of the time of writing. While the link is not likely to change in the future, the content of the link is likely to change to a newer version of GNU Classpath and may go out of sync with the quoted content below. If the change affects the correctness, please leave a comment.

Let us look at GNU Classpath's implementation of String.replace(CharSequence, CharSequence) (version 0.95 quoted).

public String replace (CharSequence target, CharSequence replacement)
{
    String targetString = target.toString();
    String replaceString = replacement.toString();
    int targetLength = target.length();
    int replaceLength = replacement.length();

    int startPos = this.indexOf(targetString);
    StringBuilder result = new StringBuilder(this);    
    while (startPos != -1)
    {
        // Replace the target with the replacement
        result.replace(startPos, startPos + targetLength, replaceString);

        // Search for a new occurrence of the target
        startPos = result.indexOf(targetString, startPos + replaceLength);
    }
    return result.toString();
}

Let us check the source code of StringBuilder.toString(). Since this decides the return value, if StringBuilder.toString() copies the buffer, then we don't need to further check any code above.

/**
 * Convert this <code>StringBuilder</code> to a <code>String</code>. The
 * String is composed of the characters currently in this StringBuilder. Note
 * that the result is a copy, and that future modifications to this buffer
 * do not affect the String.
 *
 * @return the characters in this StringBuilder
 */

public String toString()
{
    return new String(this);
}

If the documentation doesn't manage to persuade you, just follow the String constructor. Eventually, the non-public constructor String(char[], int, int, boolean) is called, with the boolean dont_copy set to false, which means that the new String must copy the buffer.

 589:   public String(StringBuilder buffer)
 590:   {
 591:       this(buffer.value, 0, buffer.count);
 592:   }

 245:   public String(char[] data, int offset, int count)
 246:   {
 247:       this(data, offset, count, false);
 248:   }

 594:   /**
 595:    * Special constructor which can share an array when safe to do so.
 596:    *
 597:    * @param data the characters to copy
 598:    * @param offset the location to start from
 599:    * @param count the number of characters to use
 600:    * @param dont_copy true if the array is trusted, and need not be copied
 601:    * @throws NullPointerException if chars is null
 602:    * @throws StringIndexOutOfBoundsException if bounds check fails
 603:    */
 604:   String(char[] data, int offset, int count, boolean dont_copy)
 605:   {
 606:       if (offset < 0)
 607:           throw new StringIndexOutOfBoundsException("offset: " + offset);
 608:       if (count < 0)
 609:           throw new StringIndexOutOfBoundsException("count: " + count);
 610:       // equivalent to: offset + count < 0 || offset + count > data.length
 611:       if (data.length - offset < count)
 612:           throw new StringIndexOutOfBoundsException("offset + count: "
 613:                                                   + (offset + count));
 614:       if (dont_copy)
 615:       {
 616:           value = data;
 617:           this.offset = offset;
 618:       }
 619:       else
 620:       {
 621:           value = new char[count];
 622:           VMSystem.arraycopy(data, offset, value, 0, count);
 623:           this.offset = 0;
 624:       }
 625:       this.count = count;
 626:   }

These evidences suggest that GNU Classpath's implementation of String.replace(CharSequence, CharSequence) does not return the same string.

Oracle

In Oracle's implementation String.replace(CharSequence, CharSequence) (version 8-b123 quoted), the method makes use of Pattern class to do the replacement.

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
            this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

Matcher.replaceAll(String) call toString() function on CharSequence and return it when no match is found:

public String replaceAll(String replacement) {
    reset();
    boolean result = find();
    if (result) {
        StringBuffer sb = new StringBuffer();
        do {
            appendReplacement(sb, replacement);
            result = find();
        } while (result);
        appendTail(sb);
        return sb.toString();
    }
    return text.toString();
}

String implements the CharSequence interface, and since the String passes itself into the Matcher, let us look at String.toString:

public String toString() {
    return this;
}

From this, we can conclude that Oracle's implementation returns the same String when no match is found.

like image 140
nhahtdh Avatar answered Oct 14 '22 15:10

nhahtdh