Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java's String.replace() vs. String.replaceFirst() vs. homebrew

I have a class that is doing a lot of text processing. For each string, which is anywhere from 100->2000 characters long, I am performing 30 different string replacements.

Example:

string modified;
for(int i = 0; i < num_strings; i++){
 modified = runReplacements(strs[i]);
 //do stuff
}

public runReplacements(String str){
  str = str.replace("foo","bar");
  str = str.replace("baz","beef");
  ....
  return str;
}

'foo', 'baz', and all other "targets" are only expected to appear once and are string literals (no need for an actual regex).

As you can imagine, I am concerned about performance :)

Given this,

  • replaceFirst() seems a bad choice because it won't use Pattern.LITERAL and will do extra processing that isn't required.

  • replace() seems a bad choice because it will traverse the entire string looking for multiple instances to be replaced.

Additionally, since my replacement texts are the same everytime, it seems to make sense for me to write my own code otherwise String.replaceFirst() or String.replace() will be doing a Pattern.compile every single time in the background. Thinking that I should write my own code, this is my thought:

  • Perform a Pattern.compile() only once for each literal replacement desired (no need to recompile every single time) (i.e. p1 - p30)

  • Then do the following for each pX: p1.matcher(str).replaceFirst(Matcher.quoteReplacement("desiredReplacement"));

This way I abandon ship on the first replacement (instead of traversing the entire string), and I am using literal vs. regex, and I am not doing a re-compile every single iteration.

So, which is the best for performance?

like image 989
jonathon ree Avatar asked Oct 01 '10 21:10

jonathon ree


People also ask

What is the difference between string replace and replaceAll?

The difference between replace() and replaceAll() method is that the replace() method replaces all the occurrences of old char with new char while replaceAll() method replaces all the occurrences of old string with the new string.

Which of the following are methods of string class in Java replace all replace last replace first?

The String Class Java has three types of Replace methods: replace() replaceAll() replaceFirst()

How do you replace a character in a string in Java without using replace method?

To replace a character in a String, without using the replace() method, try the below logic. Let's say the following is our string. int pos = 7; char rep = 'p'; String res = str. substring(0, pos) + rep + str.

What does replaceFirst do in Java?

The replaceFirst() method returns a new string where the first occurrence of the matching substring is replaced with the replacement string.


2 Answers

So, which is the best for performance?

Measure it! ;-)

ETA: Since a two word answer sounds irretrievably snarky, I'll elaborate slightly. "Measure it and tell us..." since there may be some general rule of thumb about the performance of the various approaches you cite (good ones, all) but I'm not aware of it. And as a couple of the comments on this answer have mentioned, even so, the different approaches have a high likelihood of being swamped by the application environment. So, measure it in vivo and focus on this if it's a real issue. (And let us know how it goes...)

like image 50
andersoj Avatar answered Oct 13 '22 00:10

andersoj


First, run and profile your entire application with a simple match/replace. This may show you that:

  • your application already runs fast enough, or
  • your application is spending most of its time doing something else, so optimizing the match/replace code is not worthwhile.

Assuming that you've determined that match/replace is a bottleneck, write yourself a little benchmarking application that allows you to test the performance and correctness of your candidate algorithms on representative input data. It's also a good idea to include "edge case" input data that is likely to cause problems; e.g. for the substitutions in your example, input data containing the sequence "bazoo" could be an edge case. On the performance side, make sure that you avoid the traps of Java micro-benchmarking; e.g. JVM warmup effects.

Next implement some simple alternatives and try them out. Is one of them good enough? Done!

In addition to your ideas, you could try concatenating the search terms into a single regex (e.g. "(foo|baz)" ), use Matcher.find(int) to find each occurrence, use a HashMap to lookup the replacement strings and a StringBuilder to build the output String from input string substrings and replacements. (OK, this is not entirely trivial, and it depends on Pattern/Matcher handling alternates efficiently ... which I'm not sure is the case. But that's why you should compare the candidates carefully.)

In the (IMO unlikely) event that a simple alternative doesn't cut it, this wikipedia page has some leads which may help you to implement your own efficient match/replacer.

like image 34
Stephen C Avatar answered Oct 12 '22 22:10

Stephen C