How To HTML Escape Curly Quotes in a Java String

Question

I've got a string that has curly quotes in it. I'd like to replace those with HTML entities to make sure they don't confuse other downstream systems. For my first attempt, I just added matching for the characters I wanted to replace, entering them directly in my code:

public static String escapeXml(String s) {
    StringBuilder sb = new StringBuilder();
    char characters[] = s.toCharArray();
    for ( int i = 0; i < characters.length; i++ ) {
        char c = characters[i];
        switch (c) {
            // other escape characters deleted for clarity
            case '“':
                sb.append("&#8220;");
                break;
            case '”':
                sb.append("&#8221;");
                break;
            case '‘':
                sb.append("&#8216;");
                break;
            case '’':
                sb.append("&#8217;");
                break;
            default:
                sb.append(c);
                break;
        }
    }
    return sb.toString();
}

This compiled and worked fine on my Mac, but when our CI server (which runs on Linux) tried to build it, it choked:

Out.java:[347,16] duplicate case label

Apparently some part of the build chain on the Linux box can't recognize and distinguish among these fancy characters.

My next attempt was to use Unicode escaping. Unfortunately, this won't even compile on my Mac:

...
            case '\u8220':
                sb.append("&#8220;");
                break;
            case '/u8221':
                sb.append("&#8221;");
                break;
...

My compiler throws this complaint:

Out.java:[346,21] unclosed character literal

I'm baffled as to how one might do this bit of substitution and have it work reliably across platforms. Does anybody have any pointers? Thanks in advance.

erickson · Accepted Answer

You can use the literal character (i.e., '‘'), but your build process needs to specify the correct source encoding during compilation. The javac command option is -encoding. (The attribute on Ant's javac task is the same.) This should match whatever encoding used by your IDE when saving the files.

If your IDE is using UTF-8, for example, but the build machine is using its platform default encoding of US-ASCII, the special characters will be decoded as ?. Since multiple cases now have the same label, you get the original error message.

How To HTML Escape Curly Quotes in a Java String

Tags:

java

unicode

html-entities

Sean McMains

1 Answers

erickson

Recent Activity

Donate For Us

How To HTML Escape Curly Quotes in a Java String

Tags:

java

unicode

html-entities

Sean McMains

1 Answers

erickson

Related questions

Recent Activity

Donate For Us