I've learned that String's replaceAll()
method takes a regex as an input parameter and it can cause considerable performance impact. But once time, I read this blog with a small program assert that (according my understanding):
Process the
replaceAll()
method slow for the first time but faster for the next time.
This is test result:
regex replace time taken: 14.09 milliseconds
manual replace time taken: 2.371 seconds
-----
regex replace time taken: 9.498 milliseconds
manual replace time taken: 2.406 seconds
-----
regex replace time taken: 2.184 milliseconds
manual replace time taken: 2.360 seconds
-----
What is the optimization mechanism behind this result?
Usually it doesn't cause a meaningful performance impact unless used in weird and unusual ways. In a normal use case (let's say web request) it will disappear under things like network latency and other things that take way more time. Only if you were to use replaceAll
in a very hot loop, would it become necessary to consider using the Pattern
and Matcher
classes directly, which could help with performance.
The linked tutorial site seems questionable (and there are a lot of them, which is why you should be careful what you read). For one, it compares replaceAll
with a manual replace method that is poorly written (that's why you get the seconds vs. milliseconds difference). Then it draws conclusions based on that.
So there's no optimization mechanism behind the result in the link. The reason behind the result is the badly written manual replacement method which concatenates a lot of Strings, making it slow compared to replaceAll
.
The following is taken from the official1 OpenJDK 11 source code2.
Starting with the String.replaceAll
method itself.
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
No caching here. Next the Pattern.compile
public static Pattern compile(String regex) {
return new Pattern(regex, 0);
}
No caching there either. And not in the private Pattern
constructor either.
The Pattern
constructor uses an internal compile()
method to do the working of compiling the regex to the internal form. It takes steps to avoid a Pattern
being compiled twice. But as you can see from the above, each replaceAll
call is generating a single use Pattern
object.
So why are you seeing a speedup in those performance figures?
They could be using an old version (before Java 6) of Pattern
that might have3 cached compiled patterns.
The most likely explanation is that this just a JVM warmup effect. A well written benchmark should account for that, but the benchmark that is used in that blog is not doing proper warmup.
In short, the speedup that you think is caused by some "optimization" is apparently just the result of JVM warmup effects such as JIT compilation of the Pattern
, Matcher
and related classes.
1 - The OpenJDK source code for Java 6 onwards is can be downloaded from https://openjdk.java.net/
2 - The OpenJDK 6 source code is doing the same thing: no caching.
3 - I have not checked, but it is moot. Performance benchmarks based on EOL versions of Java are not instructive for current versions of Java. Nobody should still be using Java 5. If they are, performance of replaceAll
is the least of their worries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With