I used to think that String.replace is faster than String.replaceAll because the latter uses Pattern regex and the former does not. But in fact there is no significant difference either in performance or implementation. This is it:
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
What's the need to use Pattern here? I wrote a non-regex replace version
static String replace(String s, String target, String replacement) {
StringBuilder sb = new StringBuilder(s);
for (int i = 0; (i = sb.indexOf(target, i)) != -1; i += replacement.length()) {
sb.replace(i, i + target.length(), replacement);
}
return sb.toString();
}
and compared performance
public static void main(String args[]) throws Exception {
String s1 = "11112233211";
for (;;) {
long t0 = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
// String s2 = s1.replace("11", "xxx");
String s2 = replace(s1, "11", "22");
}
System.out.println(System.currentTimeMillis() - t0);
}
}
Benchmarks: my version - 400ms
; JDK version - 1700ms
.
Is my test wrong or is String.replace really inefficient?
Java String replace() method replaces every occurrence of a given character with a new character and returns a new string.
Using String.String. replace() is used to replace all occurrences of a specific character or substring in a given String object without using regex. There are two overloaded methods available in Java for replace() : String.
Definition and Usage. The replace() method searches a string for a specified character, and returns a new string where the specified character(s) are replaced.
To replace a character in a String, without using the replace() method, try the below logic. Let's say the following is our string. int pos = 7; char rep = 'p'; String res = str. substring(0, pos) + rep + str.
To give you some idea how inefficient String.replace is
From the source for Java 7 update 11.
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
AFAIK, the use of a Pattern and Matcher.quiteReplacement etc is an attempt to be clear rather than efficient. I suspect it dates back to when many internal libraries were written without performance considerations.
IMHO Java 7 has seen many internal libraries improve performance, in particular reduce needless object creation. This method is an obvious candidate for improvement.
You can improve the performance by doing the copy once, instead of trying to insert into an existing StringBuilder.
static String replace2(String s, String target, String replacement) {
StringBuilder sb = null;
int start = 0;
for (int i; (i = s.indexOf(target, start)) != -1; ) {
if (sb == null) sb = new StringBuilder();
sb.append(s, start, i);
sb.append(replacement);
start = i + target.length();
}
if (sb == null) return s;
sb.append(s, start, s.length());
return sb.toString();
}
public static void main(String... ignored) {
String s1 = "11112233211";
for (; ; ) {
timeReplace(s1);
timeReplace2(s1);
timeStringReplaceRefactored(s1);
timeStringReplace(s1);
}
}
private static void timeStringReplace(String s1) {
long start0 = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
String s2 = s1.replace("11", "xxx");
if (s2.length() <= s1.length()) throw new AssertionError();
}
System.out.printf("String.replace %,d ns avg%n", System.currentTimeMillis() - start0);
}
private static void timeStringReplaceRefactored(String s1) {
long start0 = System.currentTimeMillis();
Pattern compile = Pattern.compile("11", Pattern.LITERAL);
String xxx = Matcher.quoteReplacement("xxx");
for (int i = 0; i < 1000000; i++) {
String s2 = compile.matcher(s1).replaceAll(xxx);
if (s2.length() <= s1.length()) throw new AssertionError();
}
System.out.printf("String.replace %,d ns avg (Refactored)%n", System.currentTimeMillis() - start0);
}
private static void timeReplace(String s1) {
long start0 = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
String s2 = replace(s1, "11", "xxx");
if (s2.length() <= s1.length()) throw new AssertionError();
}
System.out.printf("Replace %,d ns avg%n", System.currentTimeMillis() - start0);
}
private static void timeReplace2(String s1) {
long start0 = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
String s2 = replace2(s1, "11", "xxx");
if (s2.length() <= s1.length()) throw new AssertionError();
}
System.out.printf("My replace %,d ns avg%n", System.currentTimeMillis() - start0);
}
static String replace(String s, String target, String replacement) {
StringBuilder sb = new StringBuilder(s);
for (int i = 0; (i = sb.indexOf(target, i)) != -1; i += replacement.length()) {
sb.replace(i, i + target.length(), replacement);
}
return sb.toString();
}
prints
Replace 177 ns avg
My replace 108 ns avg
String.replace 436 ns avg (Refactored)
String.replace 598 ns avg
Catching the Pattern and replace text helps a little, but not as much as having a custom routine to do the replace.
There is one interesting aspect when comparing the two solutions, at least on my machine. The built-in version scales much better when it comes to larger strings. Given a slightly modified version of your test:
for (int i = 0; i < 10; i++) {
s1 = s1 + s1;
long t0 = call1(s1); // your implementation
long t1 = call2(s1); // 1.7_07 Oracle
long delta = t0 - t1;
System.out.println(
String.format("Iteration %s, string length %s, call1 %s, call2 %s, delta %s", i, s1.length(), t0, t1, delta));
try {
Thread.sleep(200);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
By just doubling the string length with each call, the break-even is reached already after iteration 3 or 4:
Iteration 0, string length 22, call1 450, call2 1715, delta -1265
Iteration 1, string length 44, call1 1048, call2 2152, delta -1104
Iteration 2, string length 88, call1 2695, call2 4024, delta -1329
Iteration 3, string length 176, call1 7737, call2 7574, delta 163
Iteration 4, string length 352, call1 24662, call2 15560, delta 9102
For reference the two implementations of call1 and call2:
static long call1(String s) {
long t0 = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
String s2 = replace(s, "11", "22");
}
return System.currentTimeMillis() - t0;
}
static long call2(String s) {
long t0 = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
String s2 = s.replace("11", "xxx");
}
return System.currentTimeMillis() - t0;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With