Is it worth it to check if a string contains a substring before replacing it?

Question

I saw this code today

if (translatedText.contains("â")) translatedText = translatedText.replace("â", "a");
if (translatedText.contains("ê")) translatedText = translatedText.replace("ê", "e");
...

There are 22 lines like that and I was wondering what is the point of using "ifs" before a replace. The way I understand it works, we are reading the string twice per line, while calling the replace method directly would have the same effect when there is nothing to replace and it would faster when there is something to replace.

But that is only how I guess it works. Can someone confirm or correct me?

And a second question. We are doing that replace for every vowel and for every symbol "á", "à", "â" and "ä". I bet there is a better way to do that in Java. Any suggestions?

Thanks.

T.J. Crowder · Accepted Answer

The documentation doesn't tell us what replace will do if there's no matching substring, but looking at the current implementation in Oracle's version (Java 8):

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
            this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

...it does look like you avoid some work and in particular memory allocations (the matcher) if you check first.

Which isn't to say that there isn't likely a better way to approach those 22 replacements, probably by using a single regular expression with a character class ([âê] and so on), compiling that regex once, and then using a single matcher in a loop, very roughly like this (inspired by this answer):

// You can do this part once somewhere if you want to
Pattern regex = Pattern.compile("[âê]");
// Then:
StringBuffer resultString = new StringBuffer();
Matcher regexMatcher = regex.matcher(translatedText);
while (regexMatcher.find()) {
    String match = regexMatch.group();
    String replacement;
    switch (match) {
        // ...various cases setting `replacement`
    }
    regexMatcher.appendReplacement(resultString, replacement);
}
regexMatcher.appendTail(resultString);
translatedText = resultString.toString();

or if you want to prematurely micro-optimize it (a failing of mine):

// You can do this part once somewhere if you want to
Pattern regex = Pattern.compile("[âê]");
// Then:
StringBuffer resultString = null;
Matcher regexMatcher = regex.matcher(translatedText);
while (regexMatcher.find()) {
    if (resultString == null) {
        resultString = new StringBuffer(translatedText.length() + 100);
    }
    String match = regexMatch.group();
    String replacement;
    switch (match) {
        // ...various cases setting `replacement`
    }
    regexMatcher.appendReplacement(resultString, replacement);
}
if (resultString != null) {
    regexMatcher.appendTail(resultString);
    translatedText = resultString.toString();
}

GhostCat · Answer

Regarding "performance": that might really depend on the JVM version; in other words: depending on the implementation of replace() changes, having that if in place can save you some regex-matcher cost; or not.

Regarding the second question: basically you have a lot of duplicated code there. One way to work on that:

You could define some static final field, like:

Map<String, String> replacements = new HashMap<>();

To then fill with:

replacements.put("â", "a");
...

To then replace the current code with a loop that iterates the entries of that map, using each key/value as argument to a replace() call.

Or, as shown by the other answer, you do something like

replacements.put("[áàâä]", "a");

to then use replaceAll() later on.

Ole V.V. · Answer

If what you desire is to get rid of the apparently redundant if statements without incurring a performance penalty, the quick solution is to switch to using replace(char, char):

translatedText = translatedText.replace('â', 'a');
translatedText = translatedText.replace('ê', 'e');

This avoids regex altogether, whether explicit or hidden, and in my Java 8 also avoids making a new String if there is no replacement.

Whether there is a still better way depends on several factors including taste. A couple of the other answers have promising ideas.

Is it worth it to check if a string contains a substring before replacing it?

Tags:

java

string

replace

Gonzalo Calvo

3 Answers

T.J. Crowder

GhostCat

Ole V.V.

Recent Activity

Donate For Us

Is it worth it to check if a string contains a substring before replacing it?

Tags:

java

string

replace

Gonzalo Calvo

3 Answers

T.J. Crowder

GhostCat

Ole V.V.

Related questions

Recent Activity

Donate For Us