I have very long strings that need to have a pattern removed if it appears. But it's an incredibly rare edge case for it to appear in the strings.
If I do this:
str = str.replace("pattern", "");
Then it looks like I'm creating a new string (because Java strings are immutable), which would be a waste if the original string was fine. Should I first check for a match, and then only replace if a match is found?
Checking the documentation of various implementations, none seems to require the String.replace(CharSequence, CharSequence)
method to return the same string if no match is found.
Without the requirement from the documentation, the implementation may or may not optimize the method in the case no match is found. It is best to write your code as if there is no optimization, to make sure that it runs correctly on any implementation or version of JRE.
In particular, when no match is found, Oracle's implementation (version 8-b123) returns the same String object, while GNU Classpath (version 0.95) returns a new String object regardless.
If you can find any clause in any of the documentation requiring String.replace(CharSequence, CharSequence)
to return the same String
object when no match is found, please leave a comment.
The long answer below is to show that different implementation may or may not optimize the case where no match is found.
Let us look at Oracle's implementation and GNU Classpath's implementation of String.replace(CharSequence, CharSequence)
method.
Note: This is correct as of the time of writing. While the link is not likely to change in the future, the content of the link is likely to change to a newer version of GNU Classpath and may go out of sync with the quoted content below. If the change affects the correctness, please leave a comment.
Let us look at GNU Classpath's implementation of String.replace(CharSequence, CharSequence)
(version 0.95 quoted).
public String replace (CharSequence target, CharSequence replacement)
{
String targetString = target.toString();
String replaceString = replacement.toString();
int targetLength = target.length();
int replaceLength = replacement.length();
int startPos = this.indexOf(targetString);
StringBuilder result = new StringBuilder(this);
while (startPos != -1)
{
// Replace the target with the replacement
result.replace(startPos, startPos + targetLength, replaceString);
// Search for a new occurrence of the target
startPos = result.indexOf(targetString, startPos + replaceLength);
}
return result.toString();
}
Let us check the source code of StringBuilder.toString()
. Since this decides the return value, if StringBuilder.toString()
copies the buffer, then we don't need to further check any code above.
/**
* Convert this <code>StringBuilder</code> to a <code>String</code>. The
* String is composed of the characters currently in this StringBuilder. Note
* that the result is a copy, and that future modifications to this buffer
* do not affect the String.
*
* @return the characters in this StringBuilder
*/
public String toString()
{
return new String(this);
}
If the documentation doesn't manage to persuade you, just follow the String
constructor. Eventually, the non-public constructor String(char[], int, int, boolean)
is called, with the boolean dont_copy
set to false
, which means that the new String
must copy the buffer.
589: public String(StringBuilder buffer)
590: {
591: this(buffer.value, 0, buffer.count);
592: }
245: public String(char[] data, int offset, int count)
246: {
247: this(data, offset, count, false);
248: }
594: /**
595: * Special constructor which can share an array when safe to do so.
596: *
597: * @param data the characters to copy
598: * @param offset the location to start from
599: * @param count the number of characters to use
600: * @param dont_copy true if the array is trusted, and need not be copied
601: * @throws NullPointerException if chars is null
602: * @throws StringIndexOutOfBoundsException if bounds check fails
603: */
604: String(char[] data, int offset, int count, boolean dont_copy)
605: {
606: if (offset < 0)
607: throw new StringIndexOutOfBoundsException("offset: " + offset);
608: if (count < 0)
609: throw new StringIndexOutOfBoundsException("count: " + count);
610: // equivalent to: offset + count < 0 || offset + count > data.length
611: if (data.length - offset < count)
612: throw new StringIndexOutOfBoundsException("offset + count: "
613: + (offset + count));
614: if (dont_copy)
615: {
616: value = data;
617: this.offset = offset;
618: }
619: else
620: {
621: value = new char[count];
622: VMSystem.arraycopy(data, offset, value, 0, count);
623: this.offset = 0;
624: }
625: this.count = count;
626: }
These evidences suggest that GNU Classpath's implementation of String.replace(CharSequence, CharSequence)
does not return the same string.
In Oracle's implementation String.replace(CharSequence, CharSequence)
(version 8-b123 quoted), the method makes use of Pattern
class to do the replacement.
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Matcher.replaceAll(String)
call toString()
function on CharSequence
and return it when no match is found:
public String replaceAll(String replacement) {
reset();
boolean result = find();
if (result) {
StringBuffer sb = new StringBuffer();
do {
appendReplacement(sb, replacement);
result = find();
} while (result);
appendTail(sb);
return sb.toString();
}
return text.toString();
}
String
implements the CharSequence
interface, and since the String passes itself into the Matcher
, let us look at String.toString
:
public String toString() {
return this;
}
From this, we can conclude that Oracle's implementation returns the same String when no match is found.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With