Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is String.replace implementation really efficient?

Tags:

java

I used to think that String.replace is faster than String.replaceAll because the latter uses Pattern regex and the former does not. But in fact there is no significant difference either in performance or implementation. This is it:

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
        this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

What's the need to use Pattern here? I wrote a non-regex replace version

static String replace(String s, String target, String replacement) {
    StringBuilder sb = new StringBuilder(s);
    for (int i = 0; (i = sb.indexOf(target, i)) != -1; i += replacement.length()) {
        sb.replace(i, i + target.length(), replacement);
    }
    return sb.toString();
}

and compared performance

    public static void main(String args[]) throws Exception {
        String s1 = "11112233211";
        for (;;) {
            long t0 = System.currentTimeMillis();
            for (int i = 0; i < 1000000; i++) {
//              String s2 = s1.replace("11", "xxx");
                 String s2 = replace(s1, "11", "22");
            }
            System.out.println(System.currentTimeMillis() - t0);
        }
    }

Benchmarks: my version - 400ms; JDK version - 1700ms.

Is my test wrong or is String.replace really inefficient?

like image 479
Evgeniy Dorofeev Avatar asked Jan 25 '13 08:01

Evgeniy Dorofeev


People also ask

Does string replace replacing all occurrences?

Java String replace() method replaces every occurrence of a given character with a new character and returns a new string.

Does Java string replace replace all?

Using String.String. replace() is used to replace all occurrences of a specific character or substring in a given String object without using regex. There are two overloaded methods available in Java for replace() : String.

Does string replace return a new string?

Definition and Usage. The replace() method searches a string for a specified character, and returns a new string where the specified character(s) are replaced.

How do I replace a string without replacing it?

To replace a character in a String, without using the replace() method, try the below logic. Let's say the following is our string. int pos = 7; char rep = 'p'; String res = str. substring(0, pos) + rep + str.


2 Answers

To give you some idea how inefficient String.replace is

From the source for Java 7 update 11.

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
        this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

AFAIK, the use of a Pattern and Matcher.quiteReplacement etc is an attempt to be clear rather than efficient. I suspect it dates back to when many internal libraries were written without performance considerations.

IMHO Java 7 has seen many internal libraries improve performance, in particular reduce needless object creation. This method is an obvious candidate for improvement.


You can improve the performance by doing the copy once, instead of trying to insert into an existing StringBuilder.

static String replace2(String s, String target, String replacement) {
    StringBuilder sb = null;
    int start = 0;
    for (int i; (i = s.indexOf(target, start)) != -1; ) {
        if (sb == null) sb = new StringBuilder();
        sb.append(s, start, i);
        sb.append(replacement);
        start = i + target.length();
    }
    if (sb == null) return s;
    sb.append(s, start, s.length());
    return sb.toString();
}

public static void main(String... ignored) {
    String s1 = "11112233211";
    for (; ; ) {
        timeReplace(s1);
        timeReplace2(s1);
        timeStringReplaceRefactored(s1);
        timeStringReplace(s1);
    }
}

private static void timeStringReplace(String s1) {
    long start0 = System.currentTimeMillis();
    for (int i = 0; i < 1000000; i++) {
        String s2 = s1.replace("11", "xxx");
        if (s2.length() <= s1.length()) throw new AssertionError();
    }
    System.out.printf("String.replace %,d ns avg%n", System.currentTimeMillis() - start0);
}

private static void timeStringReplaceRefactored(String s1) {
    long start0 = System.currentTimeMillis();
    Pattern compile = Pattern.compile("11", Pattern.LITERAL);
    String xxx = Matcher.quoteReplacement("xxx");
    for (int i = 0; i < 1000000; i++) {
        String s2 = compile.matcher(s1).replaceAll(xxx);
        if (s2.length() <= s1.length()) throw new AssertionError();
    }
    System.out.printf("String.replace %,d ns avg (Refactored)%n", System.currentTimeMillis() - start0);
}
private static void timeReplace(String s1) {
    long start0 = System.currentTimeMillis();
    for (int i = 0; i < 1000000; i++) {
        String s2 = replace(s1, "11", "xxx");
        if (s2.length() <= s1.length()) throw new AssertionError();
    }
    System.out.printf("Replace %,d ns avg%n", System.currentTimeMillis() - start0);
}

private static void timeReplace2(String s1) {
    long start0 = System.currentTimeMillis();
    for (int i = 0; i < 1000000; i++) {
        String s2 = replace2(s1, "11", "xxx");
        if (s2.length() <= s1.length()) throw new AssertionError();
    }
    System.out.printf("My replace %,d ns avg%n", System.currentTimeMillis() - start0);
}

static String replace(String s, String target, String replacement) {
    StringBuilder sb = new StringBuilder(s);
    for (int i = 0; (i = sb.indexOf(target, i)) != -1; i += replacement.length()) {
        sb.replace(i, i + target.length(), replacement);
    }
    return sb.toString();
}

prints

Replace 177 ns avg
My replace 108 ns avg
String.replace 436 ns avg (Refactored)
String.replace 598 ns avg

Catching the Pattern and replace text helps a little, but not as much as having a custom routine to do the replace.

like image 168
Peter Lawrey Avatar answered Nov 03 '22 01:11

Peter Lawrey


There is one interesting aspect when comparing the two solutions, at least on my machine. The built-in version scales much better when it comes to larger strings. Given a slightly modified version of your test:

for (int i = 0; i < 10; i++) {
    s1 = s1 + s1;
    long t0 = call1(s1); // your implementation
    long t1 = call2(s1); // 1.7_07 Oracle
    long delta = t0 - t1;

    System.out.println(
      String.format("Iteration %s, string length %s, call1 %s, call2 %s, delta %s", i, s1.length(), t0, t1, delta));

    try {
        Thread.sleep(200);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

By just doubling the string length with each call, the break-even is reached already after iteration 3 or 4:

Iteration 0, string length 22, call1 450, call2 1715, delta -1265
Iteration 1, string length 44, call1 1048, call2 2152, delta -1104
Iteration 2, string length 88, call1 2695, call2 4024, delta -1329
Iteration 3, string length 176, call1 7737, call2 7574, delta 163
Iteration 4, string length 352, call1 24662, call2 15560, delta 9102

For reference the two implementations of call1 and call2:

static long call1(String s) {
    long t0 = System.currentTimeMillis();
    for (int i = 0; i < 1000000; i++) {
        String s2 = replace(s, "11", "22");
    }
    return System.currentTimeMillis() - t0;
}

static long call2(String s) {
    long t0 = System.currentTimeMillis();
    for (int i = 0; i < 1000000; i++) {
        String s2 = s.replace("11", "xxx");
    }
    return System.currentTimeMillis() - t0;
}
like image 20
home Avatar answered Nov 03 '22 00:11

home