Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it worth it to check if a string contains a substring before replacing it?

I saw this code today

if (translatedText.contains("â")) translatedText = translatedText.replace("â", "a");
if (translatedText.contains("ê")) translatedText = translatedText.replace("ê", "e");
...

There are 22 lines like that and I was wondering what is the point of using "ifs" before a replace. The way I understand it works, we are reading the string twice per line, while calling the replace method directly would have the same effect when there is nothing to replace and it would faster when there is something to replace.

But that is only how I guess it works. Can someone confirm or correct me?

And a second question. We are doing that replace for every vowel and for every symbol "á", "à", "â" and "ä". I bet there is a better way to do that in Java. Any suggestions?

Thanks.

like image 633
Gonzalo Calvo Avatar asked Dec 19 '16 11:12

Gonzalo Calvo


3 Answers

The documentation doesn't tell us what replace will do if there's no matching substring, but looking at the current implementation in Oracle's version (Java 8):

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
            this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

...it does look like you avoid some work and in particular memory allocations (the matcher) if you check first.

Which isn't to say that there isn't likely a better way to approach those 22 replacements, probably by using a single regular expression with a character class ([âê] and so on), compiling that regex once, and then using a single matcher in a loop, very roughly like this (inspired by this answer):

// You can do this part once somewhere if you want to
Pattern regex = Pattern.compile("[âê]");
// Then:
StringBuffer resultString = new StringBuffer();
Matcher regexMatcher = regex.matcher(translatedText);
while (regexMatcher.find()) {
    String match = regexMatch.group();
    String replacement;
    switch (match) {
        // ...various cases setting `replacement`
    }
    regexMatcher.appendReplacement(resultString, replacement);
}
regexMatcher.appendTail(resultString);
translatedText = resultString.toString();

or if you want to prematurely micro-optimize it (a failing of mine):

// You can do this part once somewhere if you want to
Pattern regex = Pattern.compile("[âê]");
// Then:
StringBuffer resultString = null;
Matcher regexMatcher = regex.matcher(translatedText);
while (regexMatcher.find()) {
    if (resultString == null) {
        resultString = new StringBuffer(translatedText.length() + 100);
    }
    String match = regexMatch.group();
    String replacement;
    switch (match) {
        // ...various cases setting `replacement`
    }
    regexMatcher.appendReplacement(resultString, replacement);
}
if (resultString != null) {
    regexMatcher.appendTail(resultString);
    translatedText = resultString.toString();
}
like image 98
T.J. Crowder Avatar answered Sep 27 '22 02:09

T.J. Crowder


Regarding "performance": that might really depend on the JVM version; in other words: depending on the implementation of replace() changes, having that if in place can save you some regex-matcher cost; or not.

Regarding the second question: basically you have a lot of duplicated code there. One way to work on that:

You could define some static final field, like:

Map<String, String> replacements = new HashMap<>();

To then fill with:

replacements.put("â", "a");
...

To then replace the current code with a loop that iterates the entries of that map, using each key/value as argument to a replace() call.

Or, as shown by the other answer, you do something like

replacements.put("[áàâä]", "a");

to then use replaceAll() later on.

like image 23
GhostCat Avatar answered Sep 24 '22 02:09

GhostCat


If what you desire is to get rid of the apparently redundant if statements without incurring a performance penalty, the quick solution is to switch to using replace(char, char):

translatedText = translatedText.replace('â', 'a');
translatedText = translatedText.replace('ê', 'e');

This avoids regex altogether, whether explicit or hidden, and in my Java 8 also avoids making a new String if there is no replacement.

Whether there is a still better way depends on several factors including taste. A couple of the other answers have promising ideas.

like image 21
Ole V.V. Avatar answered Sep 27 '22 02:09

Ole V.V.