Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why String's replaceAll() method come at high performance cost at the first time and faster at the next time?

I've learned that String's replaceAll() method takes a regex as an input parameter and it can cause considerable performance impact. But once time, I read this blog with a small program assert that (according my understanding):

Process the replaceAll() method slow for the first time but faster for the next time.

This is test result:

regex replace time taken: 14.09 milliseconds
manual replace time taken: 2.371 seconds
-----
regex replace time taken: 9.498 milliseconds
manual replace time taken: 2.406 seconds
-----
regex replace time taken: 2.184 milliseconds
manual replace time taken: 2.360 seconds
-----

What is the optimization mechanism behind this result?

like image 800
logbasex Avatar asked Jan 01 '23 06:01

logbasex


2 Answers

Usually it doesn't cause a meaningful performance impact unless used in weird and unusual ways. In a normal use case (let's say web request) it will disappear under things like network latency and other things that take way more time. Only if you were to use replaceAll in a very hot loop, would it become necessary to consider using the Pattern and Matcher classes directly, which could help with performance.

The linked tutorial site seems questionable (and there are a lot of them, which is why you should be careful what you read). For one, it compares replaceAll with a manual replace method that is poorly written (that's why you get the seconds vs. milliseconds difference). Then it draws conclusions based on that.

So there's no optimization mechanism behind the result in the link. The reason behind the result is the badly written manual replacement method which concatenates a lot of Strings, making it slow compared to replaceAll.

like image 122
Kayaman Avatar answered Jan 14 '23 13:01

Kayaman


The following is taken from the official1 OpenJDK 11 source code2.

Starting with the String.replaceAll method itself.

public String replaceAll(String regex, String replacement) {
    return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}

No caching here. Next the Pattern.compile

public static Pattern compile(String regex) {
    return new Pattern(regex, 0);
}

No caching there either. And not in the private Pattern constructor either.

The Pattern constructor uses an internal compile() method to do the working of compiling the regex to the internal form. It takes steps to avoid a Pattern being compiled twice. But as you can see from the above, each replaceAll call is generating a single use Pattern object.

So why are you seeing a speedup in those performance figures?

  • They could be using an old version (before Java 6) of Pattern that might have3 cached compiled patterns.

  • The most likely explanation is that this just a JVM warmup effect. A well written benchmark should account for that, but the benchmark that is used in that blog is not doing proper warmup.

In short, the speedup that you think is caused by some "optimization" is apparently just the result of JVM warmup effects such as JIT compilation of the Pattern, Matcher and related classes.


1 - The OpenJDK source code for Java 6 onwards is can be downloaded from https://openjdk.java.net/

2 - The OpenJDK 6 source code is doing the same thing: no caching.

3 - I have not checked, but it is moot. Performance benchmarks based on EOL versions of Java are not instructive for current versions of Java. Nobody should still be using Java 5. If they are, performance of replaceAll is the least of their worries.

like image 29
Stephen C Avatar answered Jan 14 '23 13:01

Stephen C