Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String.replaceAll is considerably slower than doing the job yourself

I have an old piece of code that performs find and replace of tokens within a string.

It receives a map of from and to pairs, iterates over them and for each of those pairs, iterates over the target string, looks for the from using indexOf(), and replaces it with the value of to. It does all the work on a StringBuffer and eventually returns a String.

I replaced that code with this line: replaceAll("[,. ]*", "");
And I ran some comparative performance tests.
When comparing for 1,000,000 iterations, I got this:

Old Code: 1287ms
New Code: 4605ms

3 times longer!

I then tried replacing it with 3 calls to replace:
replace(",", "");
replace(".", "");
replace(" ", "");

This resulted with the following results:

Old Code: 1295
New Code: 3524

2 times longer!

Any idea why replace and replaceAll are so inefficient? Can I do something to make it faster?


Edit: Thanks for all the answers - the main problem was indeed that [,. ]* did not do what I wanted it to do. Changing it to be [,. ]+ almost equaled the performance of the non-Regex based solution. Using a pre-compiled regex helped, but was marginal. (It is a solution very applicable for my problem.

Test code:
Replace string with Regex: [,. ]*
Replace string with Regex: [,. ]+
Replace string with Regex: [,. ]+ and Pre-Compiled Pattern

like image 317
RonK Avatar asked Jun 07 '11 08:06

RonK


People also ask

Is string replace slow?

replaceAll is considerably slower than doing the job yourself. Bookmark this question.

What is the difference between string replace and replaceAll?

The difference between replace() and replaceAll() method is that the replace() method replaces all the occurrences of old char with new char while replaceAll() method replaces all the occurrences of old string with the new string.

What is string replaceAll?

Java String replaceAll() The replaceAll() method replaces each substring that matches the regex of the string with the specified text.

Is substring faster than replace?

As expected, the substring is fastest because: It avoids compiling a regular expression.


1 Answers

While using regular expressions imparts some performance impact, it should not be as terrible.

Note that using String.replaceAll() will compile the regular expression each time you call it.

You can avoid that by explicitly using a Pattern object:

Pattern p = Pattern.compile("[,. ]+");  // repeat only the following part: String output = p.matcher(input).replaceAll(""); 

Note also that using + instead of * avoids replacing empty strings and therefore might also speed up the process.

like image 160
Joachim Sauer Avatar answered Oct 05 '22 20:10

Joachim Sauer