Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a regular expression way to replace a set of characters with another set (like shell tr command)?

The shell tr command support replace one set of characters with another set. For example, echo hello | tr [a-z] [A-Z] will tranlate hello to HELLO.

In java, however, I must replace each character individually like the following

"10 Dogs Are Racing"
    .replaceAll ("0", "0")
    .replaceAll ("1", "1")
    .replaceAll ("2", "2")
    // ...
    .replaceAll ("9", "9")
    .replaceAll ("A", "A")
    // ...
;

The apache-commons-lang library provides a convenient replaceChars method to do such replacement.

// half-width to full-width
System.out.println
(
    org.apache.commons.lang.StringUtils.replaceChars
    (
        "10 Dogs Are Racing",
        "0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
        "0123456789ABCDEFEGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    )
);
// Result:
// 10 Dogs Are Racing

But as you can see, sometime the searchChars/replaceChars are too long (also too boring, please find a duplicated character in it if you want), and can be expressed by a simple regular expression [0-9A-Za-z]/[0-9A-Za-z]. Is there a regular expression way to achieve that ?

like image 379
LiuYan 刘研 Avatar asked Jun 13 '12 07:06

LiuYan 刘研


1 Answers

While there is no direct way to do this, constructing your own utility function to use in combination with replaceChars is relatively simple. The version below accepts simple character classes, without [ or ]; it does not do class negation ([^a-z]).

For your use case, you could do:

StringUtils.replaceChars(str, charRange("0-9A-Za-z"), charRange("0-9A-Za-z"))

Code:

public static String charRange(String str) {
    StringBuilder ret = new StringBuilder();
    char ch;
    for(int index = 0; index < str.length(); index++) {
        ch = str.charAt(index);
        if(ch == '\\') {
            if(index + 1 >= str.length()) {
                throw new PatternSyntaxException(
                    "Malformed escape sequence.", str, index
                );
            }
            // special case for escape character, consume next char:
            index++;
            ch = str.charAt(index);
        }
        if(index + 1 >= str.length() || str.charAt(index + 1) != '-') {
            // this was a single char, or the last char in the string
            ret.append(ch);
        } else {
            if(index + 2 >= str.length()) {
                throw new PatternSyntaxException(
                    "Malformed character range.", str, index + 1
                );
            }
            // this char was the beginning of a range
            for(char r = ch; r <= str.charAt(index + 2); r++) {
                ret.append(r);
            }
            index = index + 2;
        }
    }
    return ret.toString();
}

Produces:

0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
0-9A-Za-z : 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
like image 98
beerbajay Avatar answered Sep 21 '22 17:09

beerbajay