Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace multiple substrings at once

Say I have a file, that contains some text. There are substrings like "substr1", "substr2", "substr3" etc. in it. I need to replace all of those substrings with some other text, like "repl1", "repl2", "repl3". In Python, I would create a dictionary like this:

{
 "substr1": "repl1",
 "substr2": "repl2",
 "substr3": "repl3"
}

and create the pattern joining the keys with '|', then replace with re.sub function. Is there a similar simple way to do this in Java?

like image 682
Andrii Yurchuk Avatar asked Oct 05 '11 12:10

Andrii Yurchuk


3 Answers

First, a demonstration of the problem:

String s = "I have three cats and two dogs.";
s = s.replace("cats", "dogs")
    .replace("dogs", "budgies");
System.out.println(s);

This is intended to replace cats => dogs and dogs => budgies, but the sequential replacement operates on the result of the previous replacement, so the unfortunate output is:

I have three budgies and two budgies.

Here's my implementation of a simultaneous replacement method. It's easy to write using String.regionMatches:

public static String simultaneousReplace(String subject, String... pairs) {
    if (pairs.length % 2 != 0) throw new IllegalArgumentException(
        "Strings to find and replace are not paired.");
    StringBuilder sb = new StringBuilder();
    int numPairs = pairs.length / 2;
    outer:
    for (int i = 0; i < subject.length(); i++) {
        for (int j = 0; j < numPairs; j++) {
            String find = pairs[j * 2];
            if (subject.regionMatches(i, find, 0, find.length())) {
                sb.append(pairs[j * 2 + 1]);
                i += find.length() - 1;
                continue outer;
            }
        }
        sb.append(subject.charAt(i));
    }
    return sb.toString();
}

Testing:

String s = "I have three cats and two dogs.";
s = simultaneousReplace(s,
    "cats", "dogs",
    "dogs", "budgies");
System.out.println(s);

Output:

I have three dogs and two budgies.

Additionally, it is sometimes useful when doing simultaneous replacement, to make sure to look for the longest match. (PHP's strtr function does this, for example.) Here is my implementation for that:

public static String simultaneousReplaceLongest(String subject, String... pairs) {
    if (pairs.length % 2 != 0) throw new IllegalArgumentException(
        "Strings to find and replace are not paired.");
    StringBuilder sb = new StringBuilder();
    int numPairs = pairs.length / 2;
    for (int i = 0; i < subject.length(); i++) {
        int longestMatchIndex = -1;
        int longestMatchLength = -1;
        for (int j = 0; j < numPairs; j++) {
            String find = pairs[j * 2];
            if (subject.regionMatches(i, find, 0, find.length())) {
                if (find.length() > longestMatchLength) {
                    longestMatchIndex = j;
                    longestMatchLength = find.length();
                }
            }
        }
        if (longestMatchIndex >= 0) {
            sb.append(pairs[longestMatchIndex * 2 + 1]);
            i += longestMatchLength - 1;
        } else {
            sb.append(subject.charAt(i));
        }
    }
    return sb.toString();
}

Why would you need this? Example follows:

String truth = "Java is to JavaScript";
truth += " as " + simultaneousReplaceLongest(truth,
    "Java", "Ham",
    "JavaScript", "Hamster");
System.out.println(truth);

Output:

Java is to JavaScript as Ham is to Hamster

If we had used simultaneousReplace instead of simultaneousReplaceLongest, the output would have had "HamScript" instead of "Hamster" :)

Note that the above methods are case-sensitive. If you need case-insensitive versions it is easy to modify the above because String.regionMatches can take an ignoreCase parameter.

like image 99
Boann Avatar answered Sep 27 '22 20:09

Boann


This is how your Python-suggestion translates to Java:

Map<String, String> replacements = new HashMap<String, String>() {{
    put("substr1", "repl1");
    put("substr2", "repl2");
    put("substr3", "repl3");
}};

String input = "lorem substr1 ipsum substr2 dolor substr3 amet";

// create the pattern joining the keys with '|'
String regexp = "substr1|substr2|substr3";

StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(input);

while (m.find())
    m.appendReplacement(sb, replacements.get(m.group()));
m.appendTail(sb);


System.out.println(sb.toString());   // lorem repl1 ipsum repl2 dolor repl3 amet

This approach does a simultanious (i.e. "at once") replacement. I.e., if you happened to have

"a" -> "b"
"b" -> "c"

then this approach would give "a b" -> "b c" as opposed to the answers suggesting you should chain several calls to replace or replaceAll which would give "c c".


(If you generalize this approach to create the regexp programatically, make sure you Pattern.quote each individual search word and Matcher.quoteReplacement each replacement word.)

like image 27
aioobe Avatar answered Sep 27 '22 20:09

aioobe


yourString.replace("substr1", "repl1")
          .replace("substr2", "repl2")
          .replace("substr3", "repl3");
like image 37
Eng.Fouad Avatar answered Sep 27 '22 20:09

Eng.Fouad